Dynamic Programming Exercises. CMPS 243 Homework #1

Winter 2002
Due 11 Feb 2002
(Last Update: 10:34 PST 3 February 2002 )

WARNING to BME100 students: this assignment is similar to, but NOT identical to, the assignment for BME100 last quarter. You will need to redo the hand computations and modify your program.

There is a nice review of dynamic programming on the web at http://www.blc.arizona.edu/courses/bioinformatics/salign.html . Also at that site is a copy of the BLOSUM62 matrix, though you might prefer the output from the program that created blast, as it is machine-readable and makes it clear that 1/2 bit units were used.

There is a nice tutorial on dynamic programming at http://www.sbc.su.se/~per/molbioinfo2001/seqali-dyn.html with pointers to an example at http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html. It uses the more expensive alignment algorithm that allows arbitrary gap costs for different length gaps, though only an affine gap cost is used. If you want a textbook presentation, the best I know is in Chapter 2 of Durbin, Eddy, Krogh, and Mitchison's book Biological Sequence Analysis.

As a reminder, the version of Needleman-Wunsch we presented in class used only the nearest neighbors, and needed 3 matrices:

Mr, c = max { Dr - 1, c - 1 + subst(Ar, Bc)
Mr - 1, c - 1 + subst(Ar, Bc)
Ir - 1, c - 1 + subst(Ar, Bc)

Dr, c = max { Dr , c - 1 - extend
Mr , c - 1 - open
Ir , c - 1 - reopen

Ir, c = max { Mr-1 , c - open
Ir-1 , c - extend
Dr-1 , c - reopen

For the above recurrence, you have to be careful about your boundary conditions. You want to start with a score of 0 at (0,0) [assuming your sequence numbering starts at 1] and with -infinity for all points with negative coordinates. The recurrence relations should then define the rest of the values.

For Smith-Waterman, use the same recurrence relations, but allow 0+subst(Ar, Bc) as an alternative in the Mr,c scoring. The boundary conditions change to -infinity for all points with non-positive coordinates.

I like to display the M, D, and I matrices on the same grid:
I1,1
D1,1 M1,1
I1,2
D1,2 M1,2
I2,1
D2,1 M2,1
I2,2
D2,2 M2,2

Q1:
Using the BLOSUM62 matrix, determine which residue is most likely to be conserved? Using what you know about its sidechain, try to explain why substitutions for it are rare.

Q2:
In the BLOSUM62 matrix, there are several off-diagonal elements greater than 0. These are called conservative substitutions because they conserve some property. The positive number means that they occur more frequently than one would expect by chance. Give a table of such amino-acid pairs (I counted 20, but may have miscounted), with a one-line explanation for each why that particular substitution may be more acceptable than a random substitution. (You may find that the amino acid order in the copy of the table at http://www.blc.arizona.edu/courses/bioinformatics/blosum.html#anchor130738 useful for clustering the conservative substitutions.)

Expand a little on the reasons for the two most common substitutions.

Q3:
Compute the score for the following global alignment using the BLOSUM62 scoring matrix with a gap-opening cost of 11, a gap-extension cost of 1, and a gap-reopen cost of 7 (all are in 1/2 bit units to be compatible with the BLOSUM62 matrix). Show what you added up to get the score, since the final number alone could easily be incorrect due to a transcription or addition error.
    ----LADEIAQ--LASEV--AQ
    AAAAVAEDVARSTIA-ELSR--
    

Q4:
Use the Needleman-Wunsch (global) alignment algorithm to find the optimal alignment between LADEIAQWE and AAVAEDRWCI, using BLOSUM62 and gap costs open=11, extend=1, and reopen=7. Provide an alignment matrix showing the match, delete, and insert scores for the best path up to that point in the matrix. Connect the points for the best path and provide the alignment in the same format as in Question 3. Report the score for the best path.

Q5:
Repeat Question 4, but use the Smith-Waterman algorithm to find the optimal local alignment. Again, show the matrix, the path, the alignment, and the score.

Q6:
Write a short program (in any programming language) that takes two protein sequences and aligns them with the Smith-Waterman algorithm. You can hardwire the BLOSUM62 matrix into your code---this is intended to be a check that you understand the algorithm thoroughly, not a full-featured, user-friendly program. Please do Question 5 by hand before writing the program---that will give you an almost independent check of your understanding.

Turn in source code and the output for running the program on the sequences of Questions 3, 4, and 5. It might be a good idea to use some longer sequences as test cases also. You could pick up a sequence from a database and run BLAST to get a sequence to align with it.



slug icon to go to Scool of Engineering home page
SoE home
sketch of Kevin Karplus by Abe
Kevin Karplus's home page
BME-slug-icon
Biomolecular Engineering Department
Karplus's lab page UCSC Bioinformatics research

Questions about page content should be directed to

Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building
Locations of visitors to pages with this footer (started 3 Nov 2008)