Performs k-mean clustering on the continuous attributes in a data set (ignoring any discrete attributes).
The learner takes input and does output in c4.5 format. It expects to find the files <stem>.names
and <stem>.data.
and outputs the learned centers to a file called <stem>.centers
.
Evaluates the learned centers by comparing to the centers found in <stem>.test as follows. Learned centers are greedily matched to the closest of the test centers until each center has a match, and then the evaluation is the sum of the squared distance between each test center and its matched learned center.
You can find a more full-featured kmeans clustering algorithm by running vfkm with the -batch argument (for example you can set initial centroid locations, etc.
Move the features from vfkm into this learner because this learner will be much easier to modify than that one for new users.