The seminar will be a journal club, in which students take turns presenting papers from the literature. Everyone is expected to read all the papers, and to present one or two (depeneding on how many students take the course). We may also have presentations of original research, both by UCSC researchers and by vistors.
I will post a list of papers that we might want to read here, as I think of them. I welcome suggestions for other papers to read!
This quarter I plan to concentrate on proteins, particularly structure prediction, though students with a strong interest in other areas of bioinformatics can suggest other papers for us to read. Many of the more chemistry-related papers on this list have been taken from Lydia Gregoret's reading list for Chem 200A.
A nice article on structure prediction in the twilight zone. He revisited the HSSP analysis to see if the 20% homology rule still holds with today's larger databases, and explored some factors reflecting the reliability of twilight zone structure prediction such as the more similar than identical rule. The article has some interesting take-home messages and is very dense.
VOLUNTEER= MELISSA CLINE.
Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments
Sonnhammer, E.L.L and Eddy, S.R. and Durbin, R..
Proteins
28:405-420, 1997.
Pfam, which has gone through several releases now, is the most-respected collection of protein mutliple alignments that are based on sequence data.
VOLUNTEER=MARK DIEKHANS.
The latest PFAM paper is available from:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/NAR_1999_paper.pdf
Mark says "I put copies of the original PFAM paper in AS
215. The are on the table to right as you come in from the
outside door. Both online papers are in ~markd/pub/pfam/*,
however these are small follow-on on papers probably not of
much use with out the context of the original paper."
Until quite recently, Burkhard Rost's PHD program was the best secondary-structure predictor around. The ones that do better now (PSIPRED and our own predict-2nd) use much the same technology, but larger training sets and better multiple alignments.
VOLUNTEER= SPENCER TU
Conservation and prediction of solvent accessibility in
protein families
Burkhard Rost and Chris Sander.
Proteins: Structure, Function, and Genetics
20(3):216--226, Nov 1994.
Prediction of solvent accessibility, using neural nets like those for secondary-structure.
VOLUNTEER= SPENCER TU
Protein Secondary Structure Prediction Using Local Alignments
Asaf A. Salamov, Victor V. Solovyev
Journal of Molecular Biology, v 268, n 1, April 25, 1997, 31-36.
Uses local alignments and mutiple alignments with a variant of the nearest-neighbor algorithm to get a claimed accuracy higher than PhD's, but the exact details of the test are not clear, and the accuracy measure used is very sensitive to small details in the definition of "helix" and "strand".
VOLUNTEER= SPENCER TU
This book explains what is in a PDB file. We probably don't have time to cover the whole book, but Chapters 2 and 8 may be particularly relevant.
VOLUNTEER=NGUYET MANH
Sippl has been one of the more successful practitioners of threading as a fold-prediction technique (see CASP2 and CASP3 results). I'm not sure which of his many papers has the best presentation of his techniques: the 1990 JMB papers, the 1993 Journal of Computer-Aided Molecular Design paper, or this one.
If a student is interested in threading as technique, it may be worth reading several of Sippl's papers (as well as some of his competitors' papers, and selecting the best of the group to read.
Knowledge-based potentials--back to the roots.
Koppensteiner, WA and Sippl, Manfred.
Biochemistry (Mosc) 1998 Mar;63(3):247-52.
A more recent review paper---we may read just this one, if it contains the most useful information of the previous papers.
VOLUNTEER=DAVID KULP
Note: we actually ended up with two different papers by Sippl:
Manfred J. Sippl, "Calculation of Conformational Ensembles from Potentials of Mean Force: an Approach to the Knowledge-based Prediction of Local Structures in Globular Proteins", J. Mol. Biol. (1990) 213, 859-883.
Presents scoring functions for structure-sequence alignment based on statistics of distance and amino acid pairs. Takes a rather physical view of the numbers, and normalizes by using pseudocounts.
M. J. Sippl and J. Markus, "Predictive Power of Mean Force Pair Potentials" in Protein Structure by Distance Analysis, Copenhagen, 1993.
Expands on the method from JMB, '90, v213. In addition, includes z-score significance and various parameter tuning results.
This paper gives all the math for Dirichlet mixtures in a fairly tutorial form. Dirichlet mixtures are essential to extracting maximum information from a multiple alignment.
VOLUNTEER=CHRISTIAN BARRETT
"Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins" by Gerstein and Levitt Protein Science 1998 vol 7, 445-456
This method of structure-structure alignment directly matches the backbones of two structures, by using repeated cycles of Neddleman-Wunsch type dynamic programming and least-square fitting, to determine an alignment minimizing co-ordinate difference.
VOLUNTEER= SUGATO BASU
Overview of efficient structure-structure aligners
VOLUNTEER= MELISSA CLINE
DALI "Protein Structure Comparison by Alignment of Distance Matrices" by L. Holm and C. Sander Journal of Molecular Biology 1993 vol 233, 123--138 http://www2.ebi.ac.uk/dali/dali_jmb.html
The DALI method for optimal pairwise alignment of protein structures, using elastic similarity score between contact patterns in distance matrices and a Monte Carlo optimization for assembly of the alignments.
Two recent DALI papers from the EMBL group are available online:
VOLUNTEER= SUGATO BASU
VOLUNTEERS= DAVID KULP, MARK DIEKHANS (kernel method)
VOLUNTEER= CHRISTIAN BARRETT
VOLUNTEER= Michael Brown
A brief, somewhat dated overview of protein structure prediction, describing 1-d (alignment and 2ary structure prediction), 2-d (contact maps), and 3d approaches. No math, but 110 citations, some with good annotation.
An early paper on the dense packing of protein cores.
This paper is an early one on using rotamer libraries to do fold prediction.
This paper makes a strong argument that burial of hydrophobics is the main driving force for protein folding.
This paper makes an argument for hydrogen bonding being as important as hydrophobicity in stabilizing proteins.
A very brief overview of available phylogeny software. (Available on-line from http://www.scidirect.com, but easier to access through http://bob.ucsc.edu/library/science/ej.html