The automatically constructed models often contain self-loops (edges
for a state back to itself). These self-loops represent a subsequence
of one or more characters drawn from the distribution in the state--the
length of the subsequence being modeled as an exponential distribution.
In many cases, the subsequence can be better modeled by using two
states. One of the two loop-unrolling transformations shown in
Figure 4
is applied to all self-loops (except the start
and stop states). These transformations do not change the cost for
any sequences, but retraining the HMM can capture more detailed
information about the first or last character of the subsequence or
match the length distribution for the subsequence a little better.
Figure 4: Two possible transformations for unrolling self-loops.
Transforming the self-loop in the middle to the pair of states on the left
allows the last character of the subsequence modeled by
the self-loop to be better modeled, and transforming to the pair on
the right allows the first
character to be better modeled.