CMPS 243 Homework 2, Winter 2002

(Last Update: 00:43 PST, 15 February 2002 )

Due: Monday 25 Feb 2002

Estimating distributions of amino acids

This exercise is to test your understanding of estimating discrete probability distributions given a small sample. Apply each method to the following seven count vectors:

  1. 1 isoleucine (rest zero)
  2. 1 isoleucine, 1 valine (rest zero)
  3. 1 isoleucine, 1 phenylalanine (rest zero)
  4. 1 isoleucine, 1 histidine (rest zero)
  5. 1 isoleucine, 1 aspartic acid (rest zero)
  6. 2 isoleucines (rest zero)
  7. 3 isoleucines (rest zero)
In addition to reporting the seven probability vectors estimated from the counts, also report the entropy (in bits) for each.
  1. Maximum-likelihood method. Normalize each count vector to probabilities and compute the entropy. You many want to write a short program to compute the entropy, since the summation is tedious to do on a hand calculator.
  2. Pseudocounts (Dirichlet prior). Use the following three pseudocount vectors and compute the mean posterior estimate and the entropy for each. You may want to write a short program.
    1. all pseudocounts 1 (the uninformed prior)
    2. all pseudocounts 0.05 (a prior that favors conserved residues)
    3. Order = A C D E F G H I K L M N P Q R S T V W Y
      0.0943893 0.0193798 0.0643931 0.0721827 0.0430317 0.0759875 0.0257852 0.0654119 0.0713842 0.0900674 0.0263642 0.0551132 0.0458893 0.047656 0.0544527 0.0767514 0.0693522 0.0798881 0.0136764 0.0385562
      (This is available in /projects/compbio/lib/rev4.1comp)
  3. Dirichlet mixture. Write a program (in any programming language that provides lgamma) to compute the mean posterior estimate from a count vector and a Dirichlet mixture. Use the Dirichlet mixture recode3.20comp (also available in /projects/compbio/lib/recode3.20comp). Note that the "DirichletReg" format provides 21 numbers in the the "Alpha" vector. The first number is the sum of the 20 pseudocounts, and can be ignored.


slug icon to go to Scool of Engineering home page
SoE home
sketch of Kevin Karplus by Abe
Kevin Karplus's home page
BME-slug-icon
Biomolecular Engineering Department
Karplus's lab page UCSC Bioinformatics research

Questions about page content should be directed to

Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building
Locations of visitors to pages with this footer (started 3 Nov 2008)