Performance is measured using a three-way cross-validated experiment. The gene expression vectors are randomly divided into three groups. Classifiers are then trained using two groups and tested on the third.
The performance of each classifier is measured by examining how well
the classifier identifies the positive and negative examples in the
test set. Most of the classification methods return a rank ordering
of the test set. Given this ordering and a classification threshold,
each gene in the test set can be labeled in one of four ways: false
positives are genes that the classifier places within the given class,
but MYGD classifies as non-members; false negatives are genes that the
classifier places outside the class, but MYGD classifies as members;
true positives are class members according to both the classifier and
MYGD, and true negatives are non-members according to both. For each
method, we find the classification threshold that minimizes the cost
function,
,
where fp is the number of false
positives, and fn is the number of false negatives. The false
negatives are weighted more heavily than the false positives because,
for these data, the number of positive examples is small compared to
the number of negatives. Results are reported in terms of the false
positive and false negative error rates as well as the cost at the
minimal classification threshold.
Note that the two decision tree methods do not produce a rank ordering of test set points, making it impossible to vary the classification threshold. Therefore, for the decision tree methods we use the default threshold, rather than the one found by minimizing the cost function.