Learning is the ability to improve one's behaviour based on experience and represents an essential element of computational intelligence. Decision trees are a simple yet successful technique for supervised classification learning. This applet demonstrates how to build a decision tree using a training dataset and then use the tree to classify unseen examples in a test dataset.
This applet provides several sample datasets of examples to learn and classify, however, you can also create or import your own datasets. Before building a decision tree, the dataset can be viewed, and examples can be moved to and from the training set and test set. The applet's Create Mode allows you to view and manipulate the dataset. In Solve Mode, you can watch as a decision tree is built automatically, or build the tree yourself. When building the tree manually, you can use several tools to gain more information that can guide your decisions. Once the decision tree is built, you can test the tree against the unseen examples in your test set.
Decision Tree Options Menu
Acquiring a Dataset
Create mode is used both for acquiring or creating datasets. The "View/Edit Examples" button opens a window used to manipulate the dataset and the dataset is displayed on the canvas.
The easiest way to get a dataset to build a tree for is to load a sample dataset. However, new datasets can also be created by inputting the examples or loading them from a text file.
Load a Sample Dataset: To load a sample, click "Load Sample Dataset" from the "File" menu. Then select a sample from the drop-down list and click "Load."
Creating a New Dataset: To begin creating a new dataset, click "Create New Dataset" from the "File" menu. A window will then appear that asks you for the "parameters" for the new dataset. These parameters are the names of the input values that the tree will use to predict the output value, which is also given a name and specified in the "Dataset Parameter Input" window. Input values are specified with a comma separating them and the output value name must appear last.
Loading a Dataset From File: Example data can also be loaded from a text file. Click "Open Location" from the "File" menu to specify the address of a the file you would like to load. If the file is in a valid format, it will be loaded into the program. A valid file contains one example per line, with the input values followed by the output value and separated by spaces, commas, or tabs.
Constructing the Decision Tree
Once you have created or acquired a dataset and distributed examples between the test and training sets (using the "View/Edit Examples" window), you are ready to begin building the tree. When finished constructing the decision tree, select the "Solve" tab in the upper-left corner of the screen to test the tree against the test set of examples. The decision tree is visualized as a set of nodes, edges, and edge labels. Each node in the tree represents an input parameter that examples are split on. Red rectangles represent interior nodes while green and blue rectangles represent leaf nodes. Blue rectangles are nodes that have not yet been split. The labels on the edges between nodes indicate the possible values of the parent split parameter.
Constructing a Tree Automatically: To automatically generate a tree, first select the splitting function to use from the "Decision Tree Options" menu. This splitting function will determine how the program chooses a parameter to split examples on. You can choose Random, Information Gain, Gain Ratio, or GINI. See the Algorithms section for more information on these splitting functions. After selecting a splitting function, simply click "Step" to watch the program construct the tree. Clicking "Auto Create" will cause the program to continue stepping until the tree is complete or the "Stop" button is pressed.
Conditions can be used to restrict splitting while automatically generating a tree. To set stopping conditions, click "Stopping Conditions..." from the "Decision Tree Options" menu. Clicking the checkbox beside a condition enables it and allows you to edit the parameter that appears to the right of the condition. The program will not split a node if any enabled stopping condition is met.
The minimum information gain condition will prevent splits that do not produce the information gain specified by the parameter. The minimum example count condition will not allow a node to be split if there are fewer than the specified number of examples mapped to it. Finally, the maximum depth condition will restrict splits if they will increase the maximum root-to-leaf depth of the tree beyond the specified value. Note that the root has depth 0.
Constructing a Tree Manually: You can also construct a tree by selecting parameters to split on yourself. When the node options is set to "Split Node," you can click on any blue rectangle node to split it. A window will appear with information about each of the parameters that the examples can be split on. When you have chosen a parameter, select its checkbox and click "Split."
Several tools are available to guide your splitting choices. When the "View Node Information" node options is selected, you can click any node to get summary information about the examples mapped to it, its entropy, and its GINI index. Clicking a node in "View Mapped Examples" mode will show you the examples that have been mapped to the node. "Toggle Histogram" mode allows you to quickly view the output value probability distribution for a node.
The "Show Plot" button on the control panel opens a plot of training and test set error over the number of splits. This can be useful for evaluating whether or not further splitting is likely to improve the decision tree's predictions. The default error value is the sum of squares of differences. However, the type of error plotted can be changed via the tabs on the plot error window. Other error calculations include sum of absolute values of differences between the predicted distribution and the actual distribution.
When finished constructing the decision tree, test the tree against the test set of examples. Click the "Test" button to view the test set examples classified into the categories: "Correctly Predicted," "Incorrectly Predicted," and "No Prediction."
The "Mode" tab classifies examples as correct or incorrect based on whether they mapped to a leaf with the same output value as the test example. The pie chart at the bottom of the test results window provides a quick perspective on the performance of your decision tree. The "Probabilistic" tab classifies the test results by the probabilistic error of each example using an error threshold value to classify the examples as correct or incorrect. A slider at the bottom of this window allows you to change the error threshold, and radio buttons allow you to choose between the two error calculations.
The tree modes in Solve Mode allow you to inspect individual nodes of the tree. You can view mapped test examples, node information, and toggle the histogram view.
Before automatically creating a decision tree, you can choose from several splitting functions that are used to determine which attribute to split on. The following splitting functions are available:
|Main Tools: Graph Searching | Consistency for CSP | SLS for CSP | Deduction | Belief and Decision Networks | Decision Trees | Neural Networks | STRIPS to CSP|