The AddUseful operations adds edges and states to the HMM by considering one sequence at a time, and adding one useful edge and one useful state (with a pair of edges) to the HMM for that sequence.
Dynamic programming is used to find the cost of the lowest cost path
from the start state to each state for each position in a sequence
(let's call the cost
for the path to state s in position
i).
A similar dynamic programming algorithm finds the lowest-cost path
from each state to the stop state (let's call it
).
Note that the lowest cost path from the start state to the stop state
for the sequence w as a whole is
,
independent of i.
When looking for a new edge to add, consider the state
that
minimizes
and the state
that minimizes
.
If we added an edge from
to
with cost
c, then there
would be a path through the model with total cost
If this cost is lower than the
current best path, then adding the edge would lower the cost.
Unfortunately, we can't freely add edges with arbitrary costs, since
the cost is a negative log likelihood, and adding an edge must
``steal'' probability from the other out-edges of state
.
To minimize the error in estimating how much a new edge will save, and to make the most useful change to the model, the program looks for the position i that minimizes
for which there isn't already an edge from to . An edge is added with a cost that is as large as possible while still offering some savings over the current best path. More accurately, an edge is added only if this maximum cost exceeds some user-supplied threshold, generally around 5 or 10 bits.
States are added in a similar way, by considering and , and choosing the position that minimizes
If there is no state with an edge from
and an edge to
,
then we can add a new state s with
, an
edge from
with cost c and an edge to

with cost 0.
States are added only if the max cost we can use for c and still
have a lower cost path is above a user-specified threshold, generally
around 10 bits.
The AddUseful operation is particularly valuable when modifying models constructed from a single sequence, but often adds edges or states which are idiosyncratic (useful only for a single sequence). The scripts that use AddUseful generally follow it by retuning the model and pruning out edges and states that are used infrequently.