next up previous
Next: Cross-training Up: Training a hidden Markov Previous: Training a hidden Markov

Tuning and pruning

The basic mechanism for training an HMM to a set of sequences is to repeatedly run the following tuning algorithm. First, run the Viterbi algorithm to determine the best path through the model for each sequence, and count the number of times each edge is used and the number of times each character occurs in each state. More sophisticated training algorithms, such as the Baum-Welch method [11], could be used to get expected counts, but this simple method works fairly well.

After all the sequences in the training set have been counted, the counts can be converted to probabilities for each set of characters in a state and for each set of edges out of a state. As with the counts in the simple Markov model, some care has to be taken with zero counts. Since the counts in the HMM tend to be much larger than in the simple Markov models, the handling does not have to be as careful, and the program just adds the old probability estimate plus a small fixed zero-offset to all the counts.

After the probabilities have been recomputed and converted to costs, the whole tuning step is repeated, either for a fixed number of iterations or until the change in cost per character is less than a user-specified threshold. Generally four iterations are enough to get convergence to better than 0.01 bits/base.

After tuning an HMM, some edges or states may never have been used. A simple pruning step after tuning removes these unused states and edges from the HMM.

Useless states and edges occur fairly commonly on the automatically constructed HMMs because parallel sequences are often constructed. For example, in Section 2.5  gif the states for tGg and cGg matched exactly the same characters. In training, one of the two states will get a higher count (hence lower cost) and the other will become unused.

The smaller models in Table 3  gif on page gif (hmm125, small, and cross) were constructed from simple Markov models by the methods of Section 2.5  gif and tuned on the seed sequences (REP99-gxn). Unused edges and states were removed, and the models were made slightly more general by flattening the probabilities--re-estimating the probabilities from the counts using a larger zero-offset than used in the tuning. (These models were flattened with a zero-offset of 0.1.)


next up previous
Next: Cross-training Up: Training a hidden Markov Previous: Training a hidden Markov

Rey Rivera
Thu Aug 22 14:04:06 PDT 1996