kmeans File Reference

Detailed Description

Performs k-means clustering.

Performs k-mean clustering on the continuous attributes in a data set (ignoring any discrete attributes).

The learner takes input and does output in c4.5 format. It expects to find the files <stem>.names and <stem>.data. and outputs the learned centers to a file called <stem>.centers.

Evaluates the learned centers by comparing to the centers found in <stem>.test as follows. Learned centers are greedily matched to the closest of the test centers until each center has a match, and then the evaluation is the sum of the squared distance between each test center and its matched learned center.

You can find a more full-featured kmeans clustering algorithm by running vfkm with the -batch argument (for example you can set initial centroid locations, etc.

Wish List:: Modify this learner to work with discrete attributes.
Move the features from vfkm into this learner because this learner will be much easier to modify than that one for new users.

Arguments

-f <filestem>
- Set the stem name (default DF)
-source <dir>
- Set the directory that contains the dataset (default '.')
-clusters <dir>
- Sets the number of clusters to look for, this argument is required.
-threshold <threshold>
- Iterate until every centroid moves less than this threshold.
-u
- Test by comparing to the centroids in <threshold>.test
-v
- Can be used multiple times to increase the debugging output

Generated for VFML by

hosted by