Computational Biology or Bioinformatics references

Resources needed for protein-prediction contest:

Home page for the official repository of publicly released 3D structures (mostly protein, some nucleic acid).

SCOP: Structural Classification of Proteins

The SCOP database provides a hierarchical classification of proteins of known structure into class-fold-superfamily-family-subfamily. I sometimes find it easier to access this database through the PDB or PDBSUM databases.

master site in Cambridge

CATH: Protein Structure Classification

CATH is the main competitor for SCOP in classifiying protein structures. It is more automatic and so more current than SCOP, but many researchers regard the SCOP classifications as more carefully done.

Basic Local Alignment Search Tool

BLAST is a sequence similarity search tool designed to support analysis of all available nucleotide and protein databases (PDB, Swissprot, ...). The BLAST programs have been designed for speed, with a some sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches fairly easy to distinguish from random background hits. This site has detailed information about BLAST tool including a reference manual. See also BLAST

Entrez Browser

A search method for accessing many of the databases, somewhat easier to use than SRS, and with prettier formatting, but without the direct access to HSSP files. I frequently use protein searches Also available is Batch Entrez for retrieving multiple records. http://ligand-expo.rcsb.org/ld-search.html provides ligand search capability (formerly called Ligand Depot, now Ligand Expo). You can search by chemical name or structure of the ligand.

http://pqs.ebi.ac.uk/

PQS Protein Quaternary Structure Query Form at the EBI. Has a somewhat different view from PDB of what the biological units are of protein structure. [OBSOLETE: being replaced by PISA]

http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html

PISA (Protein Interfaces, Surfaces and Assemblies) gives a more accurate view (according to Dunbrack) of multimeric proteins and their interfaces than either PQS or PDB. [Xu Q, Canutescu AA, Wang G, Shapovalov M, Obradovic Z, Dunbrack RL. Statistical Analysis of Interface Similarity in Crystals of Homologous Proteins. J Mol Biol. 2008;381(2):487-507.]

http://www.proteopedia.org/wiki/index.php/Main_Page

Protein-structure wiki, populated with pages for all of PDB, visualization by JMol.

Vast structure-structure aligner

I've found this useful for getting back editable alignments of structures. Here are some related pages:

VAST FAQ
VAST Help

Cn3D Home Page

The VAST results are returned in CN3D format. See Cn3D Tutorial for more information.

Prediction Servers SignalP server Signal peptide cleavage site predictor. TargetP server Predicts subcellular location. ProtFun server Predicts functional class of a protein. TMHMM server Transmembrane helix prediction server with documentation by Anders Krogh TMHMM does a good job of predicting transmembrane helices, but it gets inside/outside wrong often enough for me to believe that it does no better than random on that task.

Phobius

Phobius is a combined transmembrane topology and signal peptide predictor. It seems to do OK with transmembrane helices, but it seems to think that any transmembrane helix near the beginning of the sequence is a signal peptide, even if there are later transmembrane helices.

TMBpro

TMBpro is a transmembrane beta-barrel predictor (predicting strands, beta contacts, and tertiary structure).

http://www.procksi.net

ProCKSI is a meta-server for structure comparisons, using Universal Similarity Metric (USM), Maximum Contact Map Overlap, DaliLite, Combinatorial Extension (CE), TM-align, and FAST. This does not do a comparison with all of PDB, nor does it have precomputed results---instead it allows you to specify up to 250 PDB files (from RCSB or your own files) to work on in a particular session.

http://www.came.sbg.ac.at/prosup.html

ProSup structure comparison server. Provides SCOP classification info for structural hits.

EMBL-EBI Dali: server for 3-D protein structure database searches

http://www.bioinfo.biocenter.helsinki.fi:8080/dali/index.html

Precomputed alignments by DALI. Appears to replace the old FSSP database.

ExPASy Molecular Biology Server

A series of data bases, provided by the Swiss Institute of Bioinformatics (SIB), dedicated to the exploration of protein sequence-structure relationships, as well as, providing tools for two-dimensional polyacrylamide gel electrophoresis (2D-PAGE). ExPASy is enabled with data base searches focused on protein identification and characterization, homology searches, transcription processes, structure analysis and prediction, and even proper documentation and educational services.

This is the US mirror for the swiss site http://www.expasy.org.

http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/

The SUPERFAMILY database uses the SAM-T99 technology with seeds from the SCOP domain library, to provide reasonable sequence-based domain identification. The alignments are not as good as SAM-T02 ones.

Rasmol

Rasmol is probably the best of the free viewers for proteins, though the creators are now pushing Protein Explorer, but since that only exists on Windows and Mac, I'm not sure it's worth the bother. Go to the OpenRasmol site to get Rasmol 2.7.1.1 (or newer).

http://zhang.bioinformatics.ku.edu/I-TASSER/

Zhang's server was the "winner" overall at CASP7 and CASP8. Excellent predictions as measured by GDT and Hbonds in CASP8.

http://toolkit.tuebingen.mpg.de/hhpred

HHpred used HMM-HMM scoring and alignment to make structure predictions. It is fast (because it uses psi-blast multiple sequence alignments), and accurate, but it does not handle as distant relationships as SAM-T08 or I-TASSER.

http://phylogenomics.berkeley.edu/phylofacts

PhyloFacts is Kimmen Sjölander's protein family analysis tool. It includes pointers to multiple aligners, tree builders, HMM search, ... .

CustalW Multiple Sequence Alignment Interface

One of the standard multiple-sequence alignment methods. README for Clustal W version 1.6 March 1996

http://www.ch.embnet.org/software/TCoffee.html

T-coffee is one of the best multiple aligners around, but its run time is proportional to the cube of the number of sequences being run, so most SAM-T02 alignments would need to be drastically pruned before being realigned with T-coffee.

http://www.drive5.com/muscle

Muscle is supposedly a better multiple aligner than T-coffee, and much faster, fast enough to do 1000s of sequences in 10s of minutes.

http://dunbrack.fccc.edu/PISCES.php

The Pisces server provides culled (low-redundancy) lists of PDB files. It provides the best low-redundancy lists I've found.

http://prophecy.lundberg.gu.se/

Prophecy (PROfiling of PHEnotypic Characteristics in Yeast) looks for phenotypic changes (3 growth measurements: adaptation time, growth rate, and efficiency) in knockout strains of S. cerevisiae.

PDBsum

PDBsum provides access to many different databases that summarize infromation derived from PDB files. This is a handy place to go when you have a particular protein or PDB file that you want to get all the information about.

http://www.protonet.cs.huji.ac.il Protonet

Protonet claims that it "provides a very rich source for protein classification and annotations as well as a system for identifying families, subfamilies and superfamilies."

SRSWWW at EMBL, Heidelberg

A browser for searching protein sequence and structure databases, The older server may still be available at SRS WWW-TopPage.

EMBL-EBI: Solvation Preference Evaluation

EMBL 3-D modeling unit data bases:

DSSP

Definition of Secondary Structure of Proteins

FSSP

Families of Structurally Similar Proteins. The FSSP library is a collection of fold "families" based on structure-structure comparison. This now appears to be obsolete---see DALI above.

FSSP documentation

FSSP table 1 The tree of fssp files that covers the space of known folds. (local copy, not public)

Also, look at local copy of Table 2 which has a cross index of all the structural alignments in the FSSP database.

HSSP

Homology-derived Secondary Structure of Proteins The HSSP database is a database of homology-derived secondary structure of proteins. It was designed by Chris Sander and Reinhard Schneider. For every PDB file that contains at least one protein there also exists an HSSP file. An HSSP file holds a multiple sequence alignment of all SwissProt files that are significantly homologous to a PDB file (based on the MAXHOM program).

You can look at the abstract of the paper describing HSSP or get FTP access to the database itself.

PDBSELECT

Unique subset of 3-D protein structures This makes a convenient way to select all the proteins whose structures are known without oversampling proteins that are very similar in sequence. The culled PDB sets by Dunbrack's group are probably more useful in general. The most recent release of the PDBSelect subsets of the PDB sequences.

PDBFINDER

Your entry into the PDB database

PDBREPORT

Structure verification reports for PDB structures

EMBL: Sander group home page

Eddy group, Wash U. Dept. of Genetics

The documentation for the HMMer tool, a competing tool that allows the use of local alignment and uses different training methods, but is not willing to show alignments that it thinks are bad. Sean Eddy's home page

GPCRDB

G protein-coupled receptor (GPCR) data

Modules in extracellular proteins

Pfam home page

http://dunbrack.fccc.edu/SCWRL3.php

SCWRL is one of the best programs for replacing sidechains and assigning rotamers in homology modeling. SCWRL3 is particularly fast.

http://mt.cs.haifa.ac.il/seqsimstrdiff/seqsimstrdiff_local.htm

Database of Sequence-Similar, Structure-Dissimilar Protein Pairs in the PDB.

PDBeMotif allows searching for close interactions between atom types in all of PDB.

UCLA-DOE STRUCTURE PREDICTION SERVER

Ligand search

http://ligand-expo.rcsb.org/ld-search.html: provides ligand search capability (formerly called Ligand Depot, now Ligand Expo). You can search by chemical name or structure of the ligand. I've not used this feature---the previous incarnation did not work for structure search as of 27 June 2007, but it may be fixed now.
http://www.ebi.ac.uk/msd-srv/msdmotif/chem/: Allows fairly easy specification of structures of ligands, and can turn of single/double-bond check (an essential feature), but it takes a lot of clicks to get through to the PDB files.
http://www.ebi.ac.uk/msd-srv/msdchem: Name-based search, only useful for organic chemists.
< a href="http://ligand-depot.rutgers.edu/sketch.html">http://ligand-depot.rutgers.edu/sketch.html: This is the ChemAxon tool that RCSB tried to supply, but as of June 2007 it worked here and not at the RCSB site. That may have changed since.
http://relibase.ccdc.cam.ac.uk: I was pointed at this ligand tool as an alternative for ChemAxon and msdmotif, but I couldn't figure out the drawing tool (I had no trouble with either ChemAxon or msdmotif).
http://hetpdbnavi.nagahama-i-bio.ac.jp: I haven't tried this "Het-PDB Navi" database yet.
http://bioinfo-pharma.u-strasbg.fr/scPDB/: sc-PDB claims to have ChemAxon searches for druggable binding sites. Caused firefox to crash 4 July 2008.
http://metallo.scripps.edu: MDB, metalloprotein database. The "Advanced search" seems easier to use than the standard search. This is (so far) the best site I've found for looking for metal-binding sites.
http://tanna.bch.ed.ac.uk/: MESPEUS metal-binding site database. (Link provided by Dietlind, not checked yet).

Secondary structure assignment and prediction

DSSP

Definition of Secondary Structure of Proteins---the most commonly used tool for assigning secondary structure http://swift.cmbi.ru.nl/gv/dssp/dssp.pdf has a scanned pdf of the original Kabsch and Sander paper giving hte definitions of the DSSP codes.

STRIDE

The second most commonly used tool for assigning secondary structure. STRIDE uses a bit more information than DSSP, and seems to do a slightly better job of assigning structure. The main differences are in determining where helices end. DSSP and STRIDE differ on about 5% of the assignments even after reducing to a 3-state (strand, helix, other) assignment.

PSIPRED Protein Structure Prediction Server

PSI-Pred was the best secondary-structure predictor at CASP3. (Ours was second best.) In larger, more careful evaluations such as EVA, PSIPRED still seems to be the best, but the differences between PsiPred, our SAM-Txx methods, and PhD are insignificant.

Links for secondary structure prediction, with commentary.

PredictProtein

PredictProtein's old European address (moved April 1999)

EMBL's PredictProtein server provides automated searches of protein databases to predict aspects of protein secondary structure. PP performs a search of the SWISS-PROT database and performs iterative profile-based multiple alignments of the search results using "a standard dynamic programming method." The PROSITE and ProDom databases are used to identify biologically significant sites, patterns, and homology, plus results from a system of programs (PHDsec) that predicts the protein's secondary structure, solvent accessibility, transmembrane helices, and coiled-coil regions (using COILS). You can also request fold recognition by prediction-based threading using a method similar to that used by the UCLA-DOE structure prediction server.

PhD (part of the suite) was the best secondary-structure predictor available (until CASP3, when PSIPRED beat it handily, though newer versions of PhD are comparable ot PSIPRED). Here is a paper on PhD: PHD: 1D protein structure prediction (Rost)

Description of secondary structure prediction accuracy evaluation

An evaluation method for secondary-structure predictions. The segment-overlap score is supposedly a better measure of the usefulness of a predictor than the standard "Q3" measure, which is just the percent correct on a 3-letter prediction (H,E, other).

DSC Form

King and Sternberg's secondary structure predictor given a single sequence (makes a multiple alignment). If you already have a multiple alignment, use the DSC Form given an alignment instead.

Protein Secondary Structure Prediction servers

The University of Birmingham MBUG group collected pointers to four secondary structure predictors, and provided some annotation. They make the important warning that "The accuracy figures quoted are 'best case' values and so the reliability of the predictions applied to your sequence may be significantly reduced."

The MBUG home page contains pointers to resources intended for people with interests in DNA and protein sequence analysis, particularly for beginners who already have some experience of searching on the Web.

Secondary Structure Prediction methods and links

RUNNING PREDATOR

Mail Server

Quadratic Logistic Protein Secondary Structure Prediction Using Homologues

At UCSC:

UCSC Bioinformatics: The University of California, Santa Cruz, bioinformatics (computational biology) research group models the primary (sequence) and secondary structures of DNA, RNA, and protein sequences, using hidden Markov Models (SAM=Sequence Alignment and Modeling System) and stochastic context-free grammars. UCSC also pioneered the use of Dirichlet mixtures for regularizing distributions of amino acids. Several servers are provided for using SAM, particularly for remote-homology and fold-recognition of proteins.
http://genome.ucsc.edu: The main site for human, mouse, and rat genomic information. (Now has the SARS virus information also.)
UCSC Bioinformatics Seminars
Kevin Karplus research overview
MS/PhD programs: UCSC's Bioinformatics MS and PhD programs
Frequently asked questions about the Bioinformatics grad degree programs at UCSC.
BS program: UCSC's Bioinformatics MS and PhD programs
Frequently asked questions about the Bioinformatics undergrad degree program at UCSC.
SAM Documentation: The main tool at UCSC for building and using hidden Markov models---there is also the general SAM info page.
Local Sam Parameter descriptions chapter This is one chapter of the local manual for SAM---it is only accessible from UCSC and may include features not in the released version.
Lots of people need explanations of the "a2m" alignment format (a variant of the FASTA sequence format). We have that pulled out into a separate page at http://www.soe.ucsc.edu/research/compbio/a2m-desc.html
CMP243 (bioinformatics class)
Chem 200A -- Fall 1996 Chemistry 200A: Protein Biophysics (Fall-1998): A protein-structure class with a fair amount of Wolrd-wide Web usage.
Dirichlet regularizers

Evaluation of protein-structure prediction methods

LiveBench

LiveBench is an on-going evaluation of fold-recognition and ab-initio servers. Our SAM-T02 server (listed there as ST02) is doing pretty well in the current round (round 6).

EVA: Evaluation of automatic structure prediction servers

EVA evaluates mainly secondary-structure prediction, but also evaluates comparative modeling, "threading" (by which they mean fold recognition), and contacts.

PDB-CAFASP

Evaulation of perdictions using pre-release PDB structures as the targets.

http://www.bioinformatics.buffalo.edu/TM-score

TM score is a tool for measuring the similarity between two models of the same protein. Supposedly there are length corrections so that a reasonable threshold between random and non-random predictions is around 0.17 independent of protein length. The measure (like GDT) depends on sampling different rigid superpositions, so is subject to the same limitations when there are misplaced or moved domains.

CASP and CAFASP

CASP (the critical assessment of structure prediction) is a bienniel world-wide assessment of the state of protein structure prediction. Our group at UCSC has always done well in it (starting with our first attempt in CASP2). The CAFASP experiments are parallel experiments using the same target proteins, but only allowing fully automatic predictions by servers.

http://predictioncenter.org/casp6/Casp6.html

The official site for the CASP6 experiment. For official assessment results, see http://predictioncenter.org/casp6/meeting/presentations/talks.html In those talks we were referred to as either SAM-T04-hand or group 166. For the CM (comparative modeling targets), we did not come into the top 5 or 6 groups, and the rest were not ranked, though the SAM-T02 server (group 164) was ranked 7th for servers. For the FR/H (fold-recognition/homologous) targets we were ranked 15th. For the FR/A (fold-recognition/analagous) targets we were ranked 2nd. For the NF (new fold) targets, there were several different rankings, in which we came out 3rd, 9th, 2nd, 7th, 1st, 7th, 1st, 5th, 4th, 12th, 4th, 12th---generally doing better when only the first model was considered, and not the best of 5 (we did not usually attempt to diversify our predictions, so rarely had 5 very different models). Combining the various rankings (including the best of 5 rankings), put us in 4th.

http://bioinformatics.buffalo.edu/casp6

Unofficial rankings of servers and some human predictors using TM score for CASP6. In these rankings, the SAM-T04-hand group ranked (as of 16 Dec 2004)

targets/domains	rank (using 1st model)	rank (best of 5 models)
All 87	8	6
25 CM easy	21	4
18 CM hard	9	6
19 FR/H	10	12
15 FR/A	7	7
10 NF	2	6

Our old SAM-T02 server did not do as well:

targets/domains rank (using 1st model) rank (best of 5 models)

All 87 69 56

25 CM easy 59 62

18 CM hard 48 43

19 FR/H 64 59

15 FR/A 113 96

10 NF 122 93

The ancient (and now obsolete) SAM-T99 server did even worse:

targets/domains rank (using 1st model) rank (best of 5 models)

All 87 102 104

25 CM easy 88 80

18 CM hard 72 74

19 FR/H 120 121

15 FR/A 155 153

10 NF 144 139

http://www.cs.bgu.ac.il/~dfischer/CAFASP4/

CAFASP4 was not as important in this round, as servers were evaluated as part of the main CASP evaluation, without special consideration. Only a few servers chose to participate in CAFASP and not CASP (the servers of the CAFASP organizers). In the CAFASP4 evaluations (as of 22 Nov 2004), the old SAM-T02 server ranked 48/85 on the easier HM targets and 46/83 on the harder FR targets.

All the sam-t04-hand group's work, including notes (in README files, alignments, models, ...). Some of the assessment results are in assessment.html

informative news article about CASP (after CASP5)

CASP5 Home page

CAFASP3

CASP4 Home page

CAFASP2 summaries

CAFASP2 Fold Recognition Evaluation Results

CASP pages - CAME

evaluations by Sippl's group for the first 4 CASP experiments.

CASP3 Home page

CAFASP1 Critical Assessment of Fully Automated Structure Prediction

CAFASP 1

Directory of /pub/CAFASP1/comparisons

crude assessment of automatic structure predictors on the web.

CASP2 meeting. Look particularly at the evaluation of the fold-recognition part of casp2. Our group is labelled "Karplus", which is a bit misleading, since Kimmen Sjolander had at least as much to do with the decisions for CASP2.

Collections of tools

http://simgene.com/: SimGene hosts a collection of free tools. They don't have anything very exciting, currently just CGView, Primer3, ClustalW2, TCoffee, OligoCalc, and ReadSeq.

Conferences and journals (roughly newest first, not very current):

Pointers to conferences run by the International Society for Computational Biology (includes ISMB and PSB).

MASAMB-XI 2000 - Registration Form

DOE Contractor and Grantee Workshop Web Site

Intelligent Systems for Molecular Biology (ISMB)

ISMB 2000
ISMB 99
ISMB98
ISMB97
ISMB96
ISMB95

PSB 98 On-Line Proceedings

RECOMB 98

Bioinformatics Online

May 1996 Workshop at EBI, Cambridge

Meeting on Interconnection of Molecular Biology Databases

ACM Workshop on Information Retrieval and Genomics, May 2-4, 1994

RNA Structure Symposium, Santa Cruz, CA, June 25-29, 1997

Online courses and texts

The on-line course information was moved to the ISCB web site, where it will be maintained from now on. Here are some pointers that are more specific to particular sections within the more general tutorials, or to sites that may not be easily found on the ISCB page.

http://www.ncbi.nlm.nih.gov/Education/

NIH education home page, points to documentation, teaching tools, tutorials, ...

http://www.openhelix.com/cgi/freeTutorials.cgi

Free tutorials on useful resources (like PDB, UCSC genome browser, Wormbase, ...

User's Guide to the Human Genome (in nature genetics).

Scoring Matrices (part of a tutorial on bioinformatics)

Bioinformatics and Computational Molecular Biology at Washington University

MIT Biology HyperTextBook

The Massachusetts Institute of Technology (MIT) Introductory Molecular Biology course textbook covers chemical bonds and bonding, cell biology, proteins and enzymes, nucleic acids, and lipids, with review problems at the end of each chapter. The interface allows links and searches for key words.

Free search sites for using Paracel's Genematcher:

http://www.cbr.nrc.ca/newdocs/tutorials/index.html?cbrlang=eng+navbar=/newdocs/tutorials/
http://www.dna.affrc.go.jp/htdocs/SWsrch/
http://www.ch.embnet.org/software/GMFDF_form.html

DNA Repair pointers (will eventually move to own page)

Notice 00-02
DNA repair
DNA Repair at the Lawrence Livermore National Laboratory
Cancer and DNA repair research
Ataxia: DNA Repair Defects
Mutation, Mutagens, and DNA Repair
DRIG - What is DNA repair?
The Institute of Medical Radiobiology in Zurich
Mitochondrial DNA repair pathways
Scientific History of Kendric C. Smith
DNA Repair Lectures, Part (a)
DNA repair

Collections of pointers

Bioinformatik
CLUE: A collection of useful links to bio-informatics URLs, notably web resources for physical and genetic mapping, gene structure prediction and mutation analysis, protein classification and 3D structure prediction, protein sequence motif search, homology search, etc.
Genamics Home
Software from the Barton group at Oxford: Includes STAMP, which produces multiple alignments from structures.
Computational Molecular Biology at NIH: This site provides some useful information including pointers to several molecular biology database (for nucleic acids, proteins, enzymes, and others), documentation related to sequence of analysis, journal references and links to other bilogy related sites. A fairly good place to start for a newcomer.
European Bioinformatics Institute: EBI maintains several important databases and provides several different search services.
Software index at www.bioinformatik.de: A searchable collection of tool information at www.bioinformatik.de
microarrays.org - your public source for microarraying information, tools, and protocols

Distributed computing

Queue - GNU Project - Free Software Foundation (FSF)
Beowulf Project at CESDIS
Welcome to DIPC!
Welcome to DIPC!: Other Distributed Processing Pages You May Find Of Interest
Hans MULLER's SCI page at nicewww.cern.ch
MOSIX
Beowulf Project at CESDIS
Beowulf Discussion on Bioperl
Overview of the Condor High Throughput Computing System
Condor Manuals
Condor Project Homepage

Other pages (not classified yet):

PDBsum highlights page: Table of extreme points in the PDB databes (oldest, largest, newest, ...) Generally somewhat out of date, but a useful starting point for finding unusual PDB files.
http://genes.mit.edu/burgelab/topten.htm A list of the "top ten" problems in bioinformatics, according to Ewan Birney, chris Burge, and Jim Fickett.
WebLogo Sequence Logo Generation Main Form: A Web service for producing sequence logos, handy for viewing a multiple alignment to see which parts are conserved. Warning: this page does not understand our .a2m format---all insert positions must be removed. One way to remove them is with prettyalign foo.a2m -m0 -f To download the sequence-logo programs for local use, try the page for Sequence Logo Programs. We now produce sequence logos with our own program (makelogo) which understands the SAM HMM format, so makes the logos directly from the distributions that the HMMs are looking for, so this site is no longer of much use to people with the SAM software package.
The Sanger Centre : EMBOSS
MVIEW
Blocks Multiple Alignment Processor
Bioinformatics & Pattern Discovery @ IBM / The Home of TEIRESIAS, MUSCA and DELPHI
BioInfo.PL Meta Server
ftp://ftp.rcsb.org/pub/pdb/data/status/obsolete.dat
Recurrent domains in protein structures
ETH: CBRG - Darwin 2.0 Available
Entrez-Genome
Bio Netbook
IMB Jena Image Library of Biological Macromolecules
www.doubletwist.com
Documentation and Bioinformatics Courses
Bio Netbook
Protein Topology home page
ScienceDaily Magazine -- Genome Scientists Muster Computer Software Tools For Handling The Flood Of Raw Data From The Human Genome Project And Related Efforts
HMMTOP
Davor Juretic Workgroup
Similarity Searching Guides and Documentation
SA National Bioinformatics Institute
NIH Guide: PREDOCTORAL TRAINING IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY
BAliBASE benchmark alignments xn
The Whitaker Foundation: Special Opportunity Awards in Biomedical Engineering
Animal Bioinformatics & Research Menu Hubsite: elegansNet
LINKS -Biocomputing Service Group
Program Evaluation using BAliBASE benchmarks
HOBACGEN
The proWeb Project
CMBI, Center for Molecular and Biomolecular Informatics
ONLINE ANALYSIS TOOLS
Profile
Codon Usage Database
REPRO CGI
RepeatMasker Web Server
Eric Beitz: BioTeX
GOLD: Genomes OnLine Database Homepage
DOTLET
The "MAIN" crystallography: model building, density modification, refinement, structure analysis
Protein Information Resource Home Page - Protein Sequence Database - Alignment Databases - NRL3D - RESID - FAMBASE - PATCHX - PROCLASS - GeneFIND
Chemical Information Site Index
IMB Jena Image Library of Biological Macromolecules
WIT Top Page: WIT (What Is There) The WIT system helps users create metabolic reconstructions, which is made possible by the recent abundance of complete bacterial genomic sequences. Such reconstructions will for the first time set the stage for meaningful simulations of the basic behaviour of microbes, and may thus significantly advance microbial biology.
http://cyrah.med.harvard.edu/assess_final.html
BIOINFORMATICS<-->structure Abstract:Bryant
Cytokines Web
The NCGR Microbial Genome Site: Pyrobaculum aerophilum
IBCP WWW server form
The BioInformer -- EBI's bioinformatics newsletter
NiceProt View of SWISS-PROT: P01008
ACE program
The Pair Database
Ramachandran revisited
Ramachandran server
Lifting the Curtain: Using Topology to Probe the Hidden Action of Enzymes
Taxonomy Browser
The Genetic Codes
The Sanger Centre : The WORMPEP database
I-sites Library Home
I-sites : clusters
RDB: a Relational Database Management System
DINAMO: A Sequence Alignment Editor/Molecular Graphics Tool
ftp://ncbi.nlm.nih.gov/mmdb/mmdbapi/mmdbapi.doc
Index of /download_area/ASSESSMENT_PAPERS/
CASP2 Evaluation table
GLASS Home page
Archive of obsolete PDB entries
Announcement concerning SWISS-PROT - Detailed Description
CASP3 - model list viewer
Structure
Abstract: How a protein prepares for B12 binding: structure and dynamics of the B12-binding subunit of glutamate mutase from Clostridium tetanomorphum
NCBI Structure Group/MMDB Home Page
Overview of molecular forces
AMBER
Home Page of Axel Brunger's Research Group
The Karplus Group at Harvard
Protein Engineering: Internet Guide: Software
PDB-REPRDB table
CINEMA: This web site contains applets for v2.02, and the new v2.1 of the Color INteractive Editor for Multiple Alignments. Provides a client java applet to assist visualization of alignments among different sequences. It is capable of loading sequences/alignments from a database, shifting of sections, and loading/saving of the resulting alignment files Obsolete pointer to earlier version of CINEMA
Sisyphus and protein structure prediction
Bayesian Formulation of X-Ray Crystallography
Links for multiple alignment
Misha Gelfand's home page: A very long, text-only bibliography (functional analysis of nucleotide sequences reference list) is available under his home page.
MAGPIE GENOME SEQUENCING PROJECT LIST
Computers dive into gene pool: A San Jose Mercury News article about the wonders of bioinformatics.
GROMOS home page GROMOS^TM is a general-purpose molecular dynamics computer simulation package for the study of biomolecular systems. Its purpose is threefold: Simulation of arbitrary molecules in solution or crystalline state by the method of molecular dynamics (MD), stochastic dynamics (SD) or the path-integral method. Energy minimisation of arbitrary molecules. Analysis of conformations obtained by experiment or by computer simulation.
Multiple sequence alignment programs
Genome Sequencing Center - GSC Home Page
BioMagResBank
C. elegans BLAST server: Good for finding genomic data from C. elegans.
BioInformatics (Weizmann Institute, Israel)
The BioCatalog
DDBJ/EMBL/GenBank Feature Table Definition (in UK)
DDBJ/EMBL/GenBank Feature Table Definition (Canadian echo)
Human Genome Project Information: A good starting point for those unfamiliar with the proect human genome project is To Know Ourselves . It is available in PDF format (for use with Acrobat).
There is also considerable information about the project on the DOE website.
Information Theory and Molecular Recognition
CLEVER (an interface for ENTREZ)
WWW Entrez: A Hypertext Retrieval Tool for Molecular Biology
Electronic Protein Science Home Page
The World-Wide Web Virtual Library: Biosciences
GSDB: Home Page
Rnadraw Homepage
NCBI : BLAST Notebook
The National Center for Biotechnology Information
BCM Search Launcher
Molecular Computation of Solutions to Combinatorial Problems
Survey of Molecular Biology Databases and Servers
BIOSCI/bionet Electronic Newsgroup Network for Biology
The Tree of Life Home Page: This tree of life is a large phylogenetic tree, which now has some ruidmentary search capability. They don't include viruses and their coverage of archea seems feeble. It is not clear how they generate and maintain the tree.
NetBiochem Welcome Page
Directory of /pub/nmr/mirror.bruker.de/MOLMOL
MDL Home Page (Chemical drawing software)
FASEB Information Services (Federation of American Societies for Experimental Biology)
Informax,Inc. (Vector NTI viewer)
Hot Property: Biologists Who Compute
A list of some of the WWW servers for the PIR database.
ProMod: Users Guide
NCSA BIOLOGY WORKBENCH
Biology Workbench v1.4
Protein Science 5(10):1973-1983. Protein secondary structure and codon usage
David J. States
GenomeNet WWW server
DBGET/LinkDB Integrated Database Retrieval System: The LinkDB database system contains AAINDEX, a collection of over 400 numeric indices for amino acids.
G-Protein Coupled Receptor Database
Biotech Validation Suite for Protein Structures
The Biotechnology STAR Project (BioSTAR)
The Sanger Genomic analysis CGG Home page :genomic.sanger.ac.uk
TRANSFAC: Gene Regulation database
GRBase: The Growth Regulation Database: The Gene Regulation Database(GRBase) is a compendium of information on the structure and function of proteins involved in the control of gene expression in eukaryotes. These proteins are involved in the control of gene regulation and cell growth including growth factor and hormone receptors, transcription factors, and other proteins involved in signal transduction. These include steroid receptors, proteins involved in apoptosis and wound healing, and transcription factors. Each entry details the name(s), protein size, DNA recognition sequence (if any), accession numbers, chromosome location, mRNA size and expression, key journal articles and overview of the structure and function of the protein and related proteins.
Birkbeck College Software Library: The Birkbeck College software library is an object-oriented toolkit for molecular modelling and sequence analysis, for use by molecular biology software developers. The ANSI C++ library features a graphic user interface for construction and manipulation of language-independent classes for molecular structure modelling and sequence analysis.
Online Tools
BIRCH HOME PAGE: A setup of freeware for genome analysis
WebBlast Software
PDB 1mmc
SYQUA HOME PAGE
Software (a ratehr useless title)
Nature Structural Biology: Survey
Genome Jobs
TMAP Documentation
Services at MBCR and throughout the world
Bioinformatics at Stanford
The Molecular Expressions Photo Gallery
PAPIA : PArallel Protein Information Analysis system
Phylodendron - Phylogenetic tree printer
PHYLOGENY
Swiss-PdbViewer: Download
IUBio Archive
Drawseq - biosequence drawing tool
SA National Bioinformatics Institute
Molecular Structure Generator - MICE
IUBio Archive
PAL: Phylogenetic Algorithms Library
TipDate: Estimating Substitution Rates
Molecular Applications Group Validates Panther Technology
CMBI - Home Page: This is where Gert Vriend is moving to from EMBL.
BIOcomputing unit, 3D modelling, members.
Entelechon GmbH
National Biotechnology Information Facility
AboutSRPDB
SRPDB
Ian Sillitoe's Home Page
Fokker-Planck Theory and Knot topology as a Method for Computing the Folding of Proteins by Lawrence B. Crowell: A rather simplistic model by a physicist for how proteins fold with lots of formulas from statistical mechanics. The model is not validated with any data whatsoever.
Outline of a proposed mechanism of protein folding. It uses Feynman rachets to model the mechanics of polypeptide formation, Fokker Planck equations to explain ATP hydrolysis and the Ising spin lattice model to analyse thermodynamic stability. The mathematical modelling uses the Yang Baxter relation in knot topology. A neural-network based algorithm is described to predict the 3D structure of a protein sequence in time O(n^2 + nlogn), but not implemented or tested.
Index of /~czhu: C. Zhu co-authored a paper with me on antizyme in S. pombe, but not S. cerevisiae
Research activities of SCAI bioinformatics group: Research activities of Lengauer's group at SCAI, highlighting projects on homology-based protein modelling and development of fast algorithms for bio-molecular docking. Available tools are FlexX - the fastest protein-ligand docking program, TardId - identification of target proteins in genome data, Protal - for protein sequence homology and structure study, ToPLign - a Toolbox for pairwise and multiple sequence alignment (align), parametric alignment (paral), threading and fold recognition (123D), etc
Homology or Similarity?: A complaint about the sloppy use of the term "homology" in the biology and bioinformatics literature. Some good points, but neglects the problem that homology is always a relative term (in some, not very useful sense ALL sequences are derived from the same ancestor).
PubCrawler Home Page PubCrawler Home Page: One of these pointers is for the "PubCrawler" that helps some researchers find out about new publications in their field.
Quality of Macromolecular Models
Links2Go: Bioinformatics
BIODATABASES.COM - White Paper
UC Santa Cruz Puts Human Genome Online/Programming wizard does job in 4 weeks
BLAST information
Biochemist June 2000
CMBI, 3D modelling services
The BioInformer -- Current issue
BioExchange.com Tools
BioExchange.com Tools
BioWurld - Where all Bioinformatics related URLs live.
aligntest
The Bioinformatics Program at the University of Michigan
Draft Human Genome Browser
GCG (Genetics Computer Group)
ProtoMap - search
BiBTeX Bibliography and LaTeX Style Formats for Molecular Biologists
GPCRDB home page
IMB Jena Image Library: Access to Site Database
Eric Beitz: TeXshade
CS4995 Computational Genomics - Projects
Parts List (yale aligner)
Free Molecular Visualization Programs for Unix and/or Linux: Chime, Chimera, Cn3D, Deep View Swiss-PDBViever, DINO, Flex, Garlic, gdpc, gOpenMol, Jmol, Kinemage, mdxvu, MOIL-View, MOLMOL, MolScript, Moviemol, NAMOT, QMOL, PREPI, Protein Explorer, RasMol, Raster3D, Viewmol, VMD, WinMGM, XMakemol, XMol, Ymol,
Biology Starter for Computer Scientists
The WHAT IF Web Interface
Software at UCSF Computer Graphics Lab
LED Lipase Engineering Database
Kernel Machines: About support vector machines and other kernel-based methods.
HIC-Up (hetero compound information center)
Molecular systematics and evolution lecture notes
Welcome to the Molecular Systematics homepage.
NCBI DART
biol.net - Internet Links for the Biomedical Research Community
WIT Home Page
BISR Home page
www.molbiol.net
About paa
COMBOSA3D
proteomicsSURF |:| A Pointer to Proteomics Resources |:|
All About DNA - Bases and Nucleotides
The S* Life Science Informatics Alliance
S-Star.org
Entrez-PubMed
Welcome to GenomeWeb -- New Media for the New Biology
Biological Sequence Analysis: Probablistic Models of Proteins and Nucleic Acids
Bioweb
BioInfo.PL Meta Server Job hus1
Genomes
Correlating residue numbering in the PDB
ModView.
tacg3 Central
The PAL Project
Amino Acid Information
Bioinformatics.org: Bioinformatics FAQ
IUBio Archive
Signal Peptide Prediction Server (Human), Version 1.64
Computational Biology Course
rose: random model of sequence evolution
Gale Rhodes's Home Page
Phylogeny Programs
Mavric: a python toolkit for phylogenetics
Linux4Chemistry - Linux software for chemistry: molecular modeling, visualization, graphic, quantum mechanic, dynamic, kinetic, simulation
EVA: Evaluation of automatic structure prediction servers
EVA: Evaluation of automatic structure prediction servers
DIMACS Workshop on Protein Strucuture and Structural Genomics: Prediction, Determination, Technology and Algorithms
Protein sequence analysis
Structural Biology Software Database
MSD Home
Bioinformatics: Instructions to Authors
Molecular Mechanics
Primer on Molecular Genetics (Department of Energy)
The GIMP Homepage (Gnu image manipulation)
Protein Structure Prediction
GP homepage
EMBOSS Homepage
EMBOSS: The Interface Projects
RECOMB 2001 - Schedule
ISMB 2001
:karplus
yfleung's functional genomics home
CMBI, Center for Molecular and Biomolecular Informatics
SwissPdbViewer Tutorial
BioInfo.PL Meta Server Job List
Bay Area Bioinformatics Home Page
GP homepage
Molecular Linux
WEIGHBOR Homepage
Index of /billb/weighbor/paper
A list of courses from around the world, books, on-line tutorials ... for training in bioinformatics.
http://www.queensu.ca/micr/faculty/kropinski/online.html: A list of useful on-line resources for analyzing sequences. It doesn't reflect my tastes, but those of Dr. Andrew Kropinski (Department of Microbiology & Immunology) Queen's University Kingston, Ontario.

SoE home

Kevin Karplus's home page

Biomolecular Engineering Department

UCSC Bioinformatics research

Questions about page content should be directed to Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building