Description: Perform a kernel k-nearest neighbor cross-validation with multiple classes.
Usage: kknn [options] -examples <filename> -classes <filename>
Input:
- -examples <filename> - an RDB file of examples. The first column contains labels, and the remaining columns contain real-valued features.
- -classes <filename> - a multi-column RDB file of class labels. This file must contain exactly the same number of lines as the example data file. The first column contains row names, which must appear in the same order as in the examples file. The remaining columns contain integer-valued classifications. A classification of '0' indicates an unclassified example, which will be classified by kknn, but which will not affect the classification of other examples.
kknn classification is performed on one class (column) at a time. The results are aggregated and a p-value for the class is computed.
Output: A six-column RDB file. The first column contains the class name. The second through fifth columns contain the number of true positives (TP), false negatives (FN), true negatives (TN), and false positives (FP), respectively. The sixth column contains a p-value computed for the class based on the number of errors and the class size. The file contains one row for each class in the classes file.
Options:
The following eight options modify the base kernel function. The operations occur in the order listed below.
- -verbose 1|2|3|4|5 - Set the verbosity level of the output to stderr. The default level is 2.
- -K - The number of nearest neighboring points used to classify a given point. By default, K=1 which is simply nearest neighbor. For K > 1, classes of the K nearest points are tallied and the majority class is the prediction. In the event of a 'tie' K is iteratively reduced until the tie is broken.
- -matrix - The '-examples' file contains a kernel matrix, rather than training set examples. The matrix is an n+1 by n+1 RDB matrix, where n is the number of examples. The first row and column contain data labels. The matrix entry for row x, column y, contains the kernel value K(x,y).
- -zeromean - Subtract from each element in the input data the mean of the elements in that row, giving the row a mean of zero.
- -adddiag <value> - Add the given value to the diagonal of the kernel matrix.
- -normalize - Normalize the kernel matrix by dividing K(x,y) by sqrt(K(x,x) * K(y,y)).
- -constant <value> - Add a given constant to the kernel. The default constant is 1.
- -coefficient <value> - Multiply the kernel by a given coefficient. The default coefficient is 1.
- -power <value> - Raise the kernel to a given power. The default power is 1.
- -radial - Convert the kernel to a radial basis function. If K is the base kernel, this option creates a kernel of the form exp[(-D(x,y)2)/(2 w2)], where w is the width of the kernel (see below) and D(x,y) is the distance between x and y, defined as D(x,y) = sqrt[K(x,x)2 - 2 K(x,y) + K(y,y)2].
- -widthfactor <value> - The width w of the radial basis kernel is set using a heuristic: it is the median of the distance from each positive training point to the nearest negative training point. This option specifies a multiplicative factor to be applied to that width.
- -noformatline - Usually, RDB formatted files contain column width information on the second line of the file. With this option, the program does not expect a format line in the input files and does not produce a format line in the output file.