next up previous
Next: Neighbor blurring Up: Simple Markov models Previous: Simple Markov models

Zero-offset

The simplest way to handle zero counts is to add a zero-offset z to all counts, so that the estimated probability of character b in a given context C is tex2html_wrap_inline775 . If all the counts in a context are zero (the usual case in uninteresting sequences), this method assigns a probability of 0.25 (a cost of 2 bits) to all four possible characters, independent of the choice of z. The second most common situation is to have a count of one for one character and zero for the other characters in the context. This method assigns a probability of (1+z)/(1+4z) to the character that was seen, and z/(1+4z) to the other characters.

If the seed is large enough to contain multiple instances of interesting sequences, we can also choose z to minimize the encoding cost of doing adaptive compression on the seed using the model. The minimization is currently done by using Newton's method to set the derivative of the encoding cost to zero, but almost any standard optimization technique should work.


next up previous
Next: Neighbor blurring Up: Simple Markov models Previous: Simple Markov models

Rey Rivera
Thu Aug 22 14:04:06 PDT 1996