The kernel function acts as a similarity metric between examples in
the training set. A simple form of similarity metric is the dot
product between two vectors. Previous work [Eisen et al., 1998] has
employed a normalized dot product as a similarity metric. Let Xibe the logarithm of the gene expression ratio for gene X in
experimental condition i as defined in
Section 2. Let the normalized feature vector,
be defined as
![]() |
(1) |
Intuitively, one drawback to support vector machine classification is
that the classifier is by definition based upon a planar division of
the feature space. One can easily imagine a space in which a more
complex separating surface more successfully divides family members
from non-family members. Through the use of an appropriate kernel
function, an SVM can be constructed that produces a separating
hyperplane in the feature space that corresponds to a polynomial
surface in the input space. This is accomplished by raising the dot
product kernel to a positive integer power. Squaring the kernel
yields a convex surface in the input space. Raising the kernel to
higher powers yields polynomial surfaces of higher degrees. The kernel
of degree d is defined by
.
In the feature space of this kernel, for any gene Xthere are features for all d-fold interactions between mRNA
measurements, represented by terms of the form
,
where
.
We experiment here with these kernels for degrees d = 1, 2and 3, respectively, denoted below as Dot-product-1, Dot-product-2
and Dot-product-3, resp. The degree one kernel is essentially the
normalized dot product kernel, and we also refer to it this way.
In a space in which the positive members of a class form one or more
clusters, an accurate classifier might place a Gaussian around each
cluster, thereby separating the clusters from the remaining space of
non-class members. This effect can be accomplished by placing a small
Gaussian over each support vector in the training set. If the width
of the Gaussians is chosen well, then the sum of the support vector
Gaussians will yield an accurate classifier. This is the technique
used by most radial basis function classifiers
[Schölkopf et al., 1997]. The formula for the Gaussian, or radial
basis function, SVM kernel is
The functional classes of genes examined here contain very few members relative to the total number of genes in the data set. This imbalance in the number of positive and negative training examples will cause difficulties for any classification method. For SVMs, the benefit gained by including a few class members on the correct side of the hyperplane may be exceeded by the cost associated with that hyperplane due to incorrectly labeled or inaccurately measured negative examples that also appear on the positive side of the hyperplane. In such a situation, when the magnitude of the noise in the negative examples outweighs the total number of positive examples, the optimal hyperplane located by the SVM will be uninformative, classifying all members of the training set as negative examples.
We combat this problem by modifying the matrix of kernel values
computed during SVM optimization, as mentioned previously in
Section 3. Let
be the matrix defined by the
kernel function K on the training set; i.e.,
.
By adding to the diagonal of the kernel matrix a
constant whose magnitude depends upon the class of the training
example, one can control the fraction of misclassified points in the
two classes. This technique ensures that the positive points are not
regarded as noisy labels. For positive examples, the diagonal element
is given by