This page is part of an archival collection and is no longer actively maintained.
It may contain outdated information and may not meet current or future
WCAG accessibility standards.
We provide this content, its subpages, and associated links for historical reference only.
If you need assistance, please contact support@cs.washington.edu
Converts continuous attributes into discrete ones.
Converts all continuous attributes in a data set to categorical ones. Uses two passes over data, one to gather the stats needed to pick bin boundaries, and one to do the conversion (although the first pass can be done on a sample with the -samples argument below).
bindata uses one of two methods to select bin boundaries. The first is to find the range of each attribute (by identifing its highest and lowest value) and then dividing the range into even with bins. This is the default method. The other method assumes that the attribute was generated from a Gaussian, estimates the mean and variance of the Gaussian from data, and sets bin boundaries so that each bin holds an even amount of the Gaussian's probability mass.