General Help
Decision Trees

Back to help contents.

dotContents

Overview

Menu Help

Create Mode

Solve Mode

Algorithms


dot Overview

Learning is the ability to improve one's behaviour based on experience and represents an essential element of computational intelligence. Decision trees are a simple yet successful technique for supervised classification learning. This applet demonstrates how to build a decision tree using a training dataset and then use the tree to classify unseen examples in a test dataset.

This applet provides several sample datasets of examples to learn and classify, however, you can also create or import your own datasets. Before building a decision tree, the dataset can be viewed, and examples can be moved to and from the training set and test set. The applet's Create Mode allows you to view and manipulate the dataset. In Solve Mode, you can watch as a decision tree is built automatically, or build the tree yourself. When building the tree manually, you can use several tools to gain more information that can guide your decisions. Once the decision tree is built, you can test the tree against the unseen examples in your test set.

dot Menu Help

File Menu

  • Create New Dataset - Discard all training and test examples and reset the tree. To create a new dataset, you must enter the parameter names and then add new examples using the "view/edit examples" window.
  • Load Sample Dataset - Load a set of sample examples.
  • Load From File - Load an example set from the local disk.
  • Load From URL - Open an example set from a WWW location.
  • Save Graph - Save the current example set to the local disk.
  • Print - Prints the canvas as displayed in Solve mode.
  • Quit - Exit the Decision Tree Learning Applet.

Edit Menu

  • View/Edit Text Representation of Tree - Opens a window containing a text representation of the dataset. The text can be edited. The tree can be updated from the text representation window by clicking the "Update" button. With this feature, a dataset can be indirectly loaded from a locally saved text file by pasting the text representation of the data to the text representation frame.
  • View/Edit XML Representation of Tree - Opens a window containing an XML representation of the dataset. The XML text can be edited. The tree can be updated from the text representation window by clicking the "Update" button. With this feature, a dataset can be indirectly loaded from a locally saved text file by pasting the text representation of the data to the text representation frame.

View

  • Font Size - Set the font used in the tree display.
  • Line Width - Set the line width used in the tree display.
  • Autoscale - Adjust the decision tree to be fitted in canvas.
  • Pan/Zoom - Select which mode to be in. Then when right-clicking and dragging the mouse on canvas, you can pan or zoom on the canvas.
  • Enable Anti-Aliasing - Enable or disable anti-aliasing.
  • Show Message Panel - Show or hide the message prompts above the main canvas.
  • Show Button Text - Show or hide the text on the buttons in the toolbar.
  • Show Buttons - Show or hide the removable toolbar buttons.
  • Toggle Histograms - Set whether new nodes will display a histogram when created.

Decision Tree Options Menu

  • Set Auto Create Speed - Set the speed at which the animation of Auto Create takes place.
  • Stopping Conditions... - Opens a dialog that allows you to set conditions for the Auto Create feature to stop on.
  • Splitting Functions - Set the splitting algorithm. This determines how the nodes will be split.

Help Menu

  • Legend for Nodes/Edges - Opens a dialog with a legend for describing what the graph shapes/colours mean.
  • Help - Opens this web page, which provides general help on the Decision Tree Learning applet.
  • Tutorials - Opens up the Tutorial web page. Tutorials walk through how to use the applet.
  • About CIspace - Provides brief information about the CIspace project.
  • About this Applet - Identifies the applet version and names of developers.

dot Create Mode

Acquiring a Dataset

Create mode is used both for acquiring or creating datasets. The "View/Edit Examples" button opens a window used to manipulate the dataset and the dataset is displayed on the canvas.

The easiest way to get a dataset to build a tree for is to load a sample dataset. However, new datasets can also be created by inputting the examples or loading them from a text file.

Load a Sample Dataset: To load a sample, click "Load Sample Dataset" from the "File" menu. Then select a sample from the drop-down list and click "Load."

Creating a New Dataset: To begin creating a new dataset, click "Create New Dataset" from the "File" menu. A window will then appear that asks you for the "parameters" for the new dataset. These parameters are the names of the input values that the tree will use to predict the output value, which is also given a name and specified in the "Dataset Parameter Input" window. Input values are specified with a comma separating them and the output value name must appear last.

Loading a Dataset From File: Example data can also be loaded from a text file. Click "Open Location" from the "File" menu to specify the address of a the file you would like to load. If the file is in a valid format, it will be loaded into the program. A valid file contains one example per line, with the input values followed by the output value and separated by spaces, commas, or tabs.

dot Solve Mode

Constructing the Decision Tree

Once you have created or acquired a dataset and distributed examples between the test and training sets (using the "View/Edit Examples" window), you are ready to begin building the tree. When finished constructing the decision tree, select the "Solve" tab in the upper-left corner of the screen to test the tree against the test set of examples. The decision tree is visualized as a set of nodes, edges, and edge labels. Each node in the tree represents an input parameter that examples are split on. Red rectangles represent interior nodes while green and blue rectangles represent leaf nodes. Blue rectangles are nodes that have not yet been split. The labels on the edges between nodes indicate the possible values of the parent split parameter.

Constructing a Tree Automatically: To automatically generate a tree, first select the splitting function to use from the "Decision Tree Options" menu. This splitting function will determine how the program chooses a parameter to split examples on. You can choose Random, Information Gain, Gain Ratio, or GINI. See the Algorithms section for more information on these splitting functions. After selecting a splitting function, simply click "Step" to watch the program construct the tree. Clicking "Auto Create" will cause the program to continue stepping until the tree is complete or the "Stop" button is pressed.

Conditions can be used to restrict splitting while automatically generating a tree. To set stopping conditions, click "Stopping Conditions..." from the "Decision Tree Options" menu. Clicking the checkbox beside a condition enables it and allows you to edit the parameter that appears to the right of the condition. The program will not split a node if any enabled stopping condition is met.

The minimum information gain condition will prevent splits that do not produce the information gain specified by the parameter. The minimum example count condition will not allow a node to be split if there are fewer than the specified number of examples mapped to it. Finally, the maximum depth condition will restrict splits if they will increase the maximum root-to-leaf depth of the tree beyond the specified value. Note that the root has depth 0.

Constructing a Tree Manually: You can also construct a tree by selecting parameters to split on yourself. When the node options is set to "Split Node," you can click on any blue rectangle node to split it. A window will appear with information about each of the parameters that the examples can be split on. When you have chosen a parameter, select its checkbox and click "Split."

Several tools are available to guide your splitting choices. When the "View Node Information" node options is selected, you can click any node to get summary information about the examples mapped to it, its entropy, and its GINI index. Clicking a node in "View Mapped Examples" mode will show you the examples that have been mapped to the node. "Toggle Histogram" mode allows you to quickly view the output value probability distribution for a node.

The "Show Plot" button on the control panel opens a plot of training and test set error over the number of splits. This can be useful for evaluating whether or not further splitting is likely to improve the decision tree's predictions. The default error value is the sum of squares of differences. However, the type of error plotted can be changed via the tabs on the plot error window. Other error calculations include sum of absolute values of differences between the predicted distribution and the actual distribution.

When finished constructing the decision tree, test the tree against the test set of examples. Click the "Test" button to view the test set examples classified into the categories: "Correctly Predicted," "Incorrectly Predicted," and "No Prediction."

The "Mode" tab classifies examples as correct or incorrect based on whether they mapped to a leaf with the same output value as the test example. The pie chart at the bottom of the test results window provides a quick perspective on the performance of your decision tree. The "Probabilistic" tab classifies the test results by the probabilistic error of each example using an error threshold value to classify the examples as correct or incorrect. A slider at the bottom of this window allows you to change the error threshold, and radio buttons allow you to choose between the two error calculations.

The tree modes in Solve Mode allow you to inspect individual nodes of the tree. You can view mapped test examples, node information, and toggle the histogram view.

dot Algorithms

Before automatically creating a decision tree, you can choose from several splitting functions that are used to determine which attribute to split on. The following splitting functions are available:

  • Random - The attribute to split on is chosen randomly.

  • Information Gain - The attribute to split on is the one that has the maximum information gain. To calculate the information gain for an attribute, you first compute the information content. For the attribute "Thread = new" in the mail reading example, the examples will be partitioned into a set of 3 where the user action is "skips" and 7 where the user action is "reads." The information content about the user action is then calculated as -0.3 * log0.3 - 0.7 * log 0.7 = 0.881 (using log base 2). With "Thread = old", the information content is calculated in the same way, and the result is 0.811. The expected information gain is thus 1.0 - (10/18)*0.881 + (8/18)*0.811 = 0.150. (Note that there are a total of 18 examples, 10 of which have thread value new and 8 have thread value old)

  • Gain Ratio - Selects the attribute with the highest information gain to number of input values ratio. The number of input values is the number of distinct values of an attribute occurring in the training set.

  • GINI - The attribute with the highest GINI index is chosen. The GINI index is a measure of impurity of the examples.

Valid HTML 4.0 Transitional