decisionstump File Reference

Detailed Description

Learns a decision stump (a DecisionTree with only one split).

This is a very simple learner, but it may be useful as a baseline to compare your learner against.

The decisionstump learner works in time proportional to the number of training examples. It also requires memory that is proportional to the number of classes * number attributes * number of values. Note that this can be very large for continuous attributes (which, in the worst case, have a separate value for each training example). The maxThresholds argument can be used to control this.

The learner takes input and does output in c4.5 format. It expects to find the files <stem>.names and <stem>.data. Depending on command line argument, it will either output the decision stump or test its error rate on <stem>.test.

Arguments

-f <filestem>
- Set the stem name (default DF)
-source <dir>
- Set the directory that contains the dataset (default '.')
-u
- Test on the examples in <stem>.test and output in a format appropriate for interface with xvalidate and batchtest (defaults to off)
-a
- Outputs the name of the selected attribute
-outputGains
- Outputs the information gain (and best threshold) for each attribute
-outputStump
- Prints the learned stump. Defaults to output without the -u flag and not output with it
-maxThresholds <num>
- Use the first num values from the training set as thresholds for continuous attributes, allows this program to be run on very large data sets (default use all values)
-v
- Can be used multiple times to increase the debugging output

Example

decisionstump -f banana -source datasets/banana

Looks for a dataset named 'banana' in the 'datasets/banana' directory. Outputs the decision stump learned from the data set.

Generated for VFML by

hosted by