UCSC BME 100 Fall 2001

Intro to Bioinformatics

(Last Update: 10:43 PST 30 November 2001 )
This is a required course for bioinformatics B.S. majors and is highly recommended for new graduate students (before taking CMPS243 or CMPS244).

For catalog copy and pre-requisites, see the main page for BME100.

Who, When, and Where:

Instructor: Kevin Karplus ( karplus@soe.ucsc.edu) http://www.soe.ucsc.edu/~karplus
Office hours: Th 11--12 315B Baskin Engineering
TA: Rachel Karchin ( rachelk@soe.ucsc.edu) http://www.soe.ucsc.edu/~rachelk/
Office hours: in lab sections

Lectures: MWF 3:30-4:40 in Kresge 323.

One lab section a week is required:
T 2-3:30 Crown Computer Lab (Crown 201) OR
W 2-3:10 Crown Computer Lab (Crown 201) OR
You must register for the lab and the course together---neither can be taken without the other. WARNING: these are not the times and locations in the Registrar's time schedule---labs were moved to accomodate demands on Baskin Egnineering 105 and to avoid TA schedule conflicts.

Texts

There will be two required texts, plus additional readings that will be distributed either on paper or via the Web:
Programming Perl, 3rd Edition
Larry Wall, Tom Christiansen & Jon Orwant
3rd Edition July 2000
O'Reilly and Associates
Considered the best single book on PERL. You may use other PERL tutorials or references, but I expect you to have easy access to this one.
Developing Bioinformatics Computer Skills
Cynthia Gibas & Per Jambeck
O'Reilly and Associates
A brand new book (April 2001) that has a promising table of contents. I've read about half of it so far, and it looks like a fairly good user's view of bioinformatics tools. We'll have to supplement fairly heavily to get a tool-builder's view. Be sure to check the errata page: http://www.oreilly.com/catalog/bioskills/errata/

Evaluation

There will be two types of assignments for the course and two types for the lab. The course will have reading assignments and pencil-and-paper exercises; the lab will have programming exercises to learn PERL and bioinformatics exercises using real data. The course will have a midterm and a final exam, covering the reading and the pencil-and-paper exercises. The same grade (and evaluation) will be given for both the course and the lab.

Mid-quarter modification

With the unanimous conset of the class, the midterm and final exams have been cancelled. It turns out to be very difficult to make up small enough problems for examination---almost all the homework exercises are much larger problems than could reasonably be given on a timed exam.

The assignments will be distributed on the web (see http://www.soe.ucsc.edu/~karplus/bme100/f01/homework.html).

The relative weights of the exams and the different types of assignment in the evaluation has not been determined yet---it should be roughly proportional to how much time the different assignments take to do well. We will try to assign points to each assignment as it is given, but the total number of points won't be known until we've created all the assignments.

Academic Integrity

Anyone caught cheating in the class will be reported to their college provost (see UCSC policy on academic integrity) and may fail the class. Cheating includes any attempt to claim someone else's work as your own. Plagiarism in any form (including close paraphrasing) will be considered cheating. Use of any source without proper citation will be considered cheating.

Collaboration without explicit written acknowledgement will be considered cheating. Collaboration on lab assignments with explicit written acknowledgement is encouraged---guidelines for the extent of reasonable collaboration will be given in class.

Rough list of topics we'll probably cover (not necessarily in order)

  1. Quick review of the fundamental dogma of biology: DNA->RNA->protein, bases, codons, amino acids
    Sept 21,24.
  2. Stochastic models, Bayes Rule, pseudocount example
    Sept 26.
  3. Converting abitrary scores to stochastic models: P-value and E-value. Interpreting classification results: true/false positives, specificity, sensitivity, entropy, relative entropy. Sept 28.
  4. Guest Lecture: Rachel Karchin. Introduction to Hidden Markov models Oct 1. Download the slides from the lecture (Powerpoint format).
  5. Guest Lectures: Todd Lowe (RNA genes, DNA microarrays) Oct 3,5. The lecture notes are on-line at http://www.soe.ucsc.edu/~lowe/lectures/.
  6. Aligning sequences to HMMs, dynamic programming Oct 8,10.
  7. Substitution matrices and sequence alignment scores. Oct 12.
  8. Gap costs---how they are related to HMM transition probabilities. What global alignment looks like for sequence alignment after all the simplifications. Oct 15.
  9. Local alignment (Smith-Waterman) for sequences and HMMS. Oct 17.
  10. Sequence Logos. Oct 19.
  11. Library databases (training session by library staff). New access method for PUBMED/MEDLINE, possibly covering BIOSIS and INSPEC. Oct 22.
  12. Dynamic programming once more---the simplest version of Smith-Waterman and how it relates to the more efficient version previously covered. (Note: should rearrange presentation to do simple version BEFORE efficient version.) Oct 24.
  13. Multiple alignment techniques Overview and progressive alignment Oct 26. T-Coffee Oct 29. Handed out paper on T-coffee:
    Notredame C, Higgins DG, Heringa J.
    J Mol Biol 2000 Sep 8;302(1):205-17
  14. Training HMMs (a rather muddled and confusing presentation) Oct 31.
  15. An introduction to protein structure prediction: ab initio, fold-recognition, and homology modeling. Why I think that ab initio methods have done so poorly. Notion of contact order. Handed out book chapter on SAM-T2k. Nov 2
  16. Sam-T2K fold-recognition method using HMMs. Using the transparencies from a presentation given at combio2001. Handed out paper on contact order:
    Contact order, transition state placement and the refolding rates of single domain proteins.
    Plaxco KW, Simons KT, Baker D.
    J Mol Biol 1998 Apr 10;277(4):985-94
    Nov 5.
  17. Guest Lecture: Alexander Schliep Nov 7

    Using Transitivity and Strongly Connected Components to Detect Remote Homologues

    Abstract: More specific methods for database searching are required because of the exponential growth in sequence databases, which causes an increase in noise levels. The detection of remote homologues is becoming increasingly problematic.

    We have developed a novel graph-based clustering algorithm which uses transitivity of homology, that is, inducing homology of proteins A and C from homology of proteins A and B as well as B and C. Also, a surprisingly simple modeling approach yields a high level of robustness with respect to multi-domain proteins which are a relevant source of problems.

    We have evaluated the method on SCOP releases 1.37 and 1.53, as well as Swissprot Rel. 39. Our method compares favorably with PSI-Blast. It is also robust with respect to increases in database size and, unlike methods using sequence score statistics dependent on the database, does not require re-computation as sequences are added. Except for the computation of the pair-wise alignment scores, which is expensive but conveniently a perfect idle-task, the clustering method is a linear time algorithm.

    This is joint work with Sebastian Schneckener, Eva Bolten, Peter Pipenbacher, Rainer Schrader, and Dietmar Schomburg.

  18. Nov 9. Was going to finish SAM-T2K talk, but got side-tracked into covering secondary structure (DSSP and STRIDE), mutual information, and entropy, in order to explain second track of 2-track HMM.
  19. Nov 14. More on contact order and folding rate. Started discussing secondary structure prediction.
  20. Nov 16. Protein secondary structure prediction.
  21. Nov 19. Fast methods for searching (BLAST and BLAT) Guest lecture by Jim Kent.
  22. Nov 21. Phylogeny: neighbor-joining
  23. Nov 26. Phylogeny: parsimony. Also some mention of proteomics
  24. Nov 28. Instructor evaluation forms. Introduction to hierarchy of grammars.
  25. Nov 30. Guest lecture: Todd Lowe. Computational and Functional Genomics in the Lowe lab.

Rough list of topics we didn't have enough time to do more than briefly mention:

Other resources on the web

Handouts for Rune Lygsoe's summer 2001 course on bioinformatics



slug icon to go to School of Engineering home page SoE home     UCSC Bioinformatics Home Page     BME 100 home page    

Questions about page content should be directed to

Kevin Karplus
Computer Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250