In Section 1.1
, I mentioned the main assumption that the
costmap-based search relied on: the cost of any subsequence can
be computed as the sum of costs of individual positions.
Although the Viterbi algorithm gives us a way to assign costs to
individual positions, the true cost of a subsequence may not be the
same as the sum of the costs on the best path for the whole sequence.
There is also a difference between the true compression cost of a subsequence and the cost from the map for simple Markov models, but in those models, the only difference in cost is in the first k positions, since there is no compression for the first k positions of a sequence. In the simple Markov model, we can compensate for the startup error by extending the beginnings of the subsequences found by k positions.
For hidden Markov models, a change anywhere in the sequence can change which path is the best one, and so change the cost for positions arbitrarily far away. It is not difficult to construct pathological examples in which this would completely invalidate the use of cost maps.
By limiting the types of HMMs considered, I have managed to use them quite successfully with costmap search. First, all my models contain a junk loop, which models the uninteresting sequences that occur between the interesting ones. The paths in the HMM for the interesting sequences begin and end at the junk loop. If there are multiple interesting sequences within a contiguous part of the database, then the best path should use the junk loop for the uninteresting parts and paths through the rest of the HMM for the interesting parts. The costs in the junk loop are very close to the null-model costs, and the costs in the rest of the model are very much lower, so the basic assumption of the cost map--that interesting positions, and only interesting positions, have much lower costs than the null model costs--is still satisfied.