next up previous
Next: Noise Up: Hidden Markov models for Previous: Implementation and Performance

Results and Discussion

While several reports have shown the general effectiveness of HMMs [Krogh et al., 1994a, Brown et al., 1993, Baldi et al., 1994, Eddy et al., 1995, Eddy, 1995], this section takes a close look at effectiveness of each extension to the basic method.

We choose the globin family for the first of these illustrative experiments because of our previous familiarity with the family. From a set of 624 globins, close homologues were removed using a maximum entropy weighting scheme [Krogh & Mitchison, 1995] by removing all sequences with a very small weight ( tex2html_wrap_inline3807 ), which left us with 167 globins. For the experiments, our group of 167 sequences was randomly divided into a training set of 50 sequences and a test set of 117 sequences, except in the experiments on training set size.

The statistical goodness of an HMM is tied to the final probability result of the test set. SAM reports this as a negative-log-likelihood ( tex2html_wrap_inline3809 ), or NLL, score. This section considers the effects of each of the more important extensions on NLL scores. Ideally, we would like small NLL scores that, with multiple runs using different random seeds, are sharply peaked. Such a peaked distribution implies that far fewer than the thousands of runs performed in these experiments are required to generate a good model.




next up previous
Next: Noise Up: Hidden Markov models for Previous: Implementation and Performance

Rey Rivera
Thu Aug 29 15:28:54 PDT 1996