A serious problem with any hill-climbing optimization technique is that it often ends up in a local maximum. The same is true for the forward-backward procedure used to estimate HMMs by maximizing the likelihood (or the a posteriori model probability). In fact, it will almost always end up in a local maximum.
To deal with this problem, we start the algorithm several times from
different initial models. The resulting models then represent
different local maxima, and we pick the one with the highest
likelihood. The different initial models are obtained by taking the
normalized regularizer and adding noise to all the parameters. The
noise is added to a model in the following way. A number R of
sequences are generated from the regularizer model, stepping from
state to state and generating characters according to the
regularizer's transition and match distributions. Given a noise level
, each of these R
paths through the model is added to the counts with a weight of
, before MAP reestimation. By
default, R is set to 50 random sequences.
One can also add noise to the models during their
estimation and decrease the noise level gradually in a technique similar
to the general optimization method called simulated annealing
[Kirkpatrick et al., 1983].
The initial level of the noise in this annealing process is called
as above. During the estimation process the annealing noise is
decreased by a speed determined by r. We have tested two
method for decreasing the noise:
An alternative and very elegant way of using simulated annealing is described in [Eddy, 1995].