Tutorials
Decision Trees

Back to Tutorials.

Tutorial 1: Acquiring A Dataset

You can acquire a dataset by either creating new datasets or loading a dataset from already existing examples.

1.Creating new datasets

New datasets can be created by inputting the examples or loading them from a text file.

To begin creating a new dataset by inputting examples, click "Create New Dataset" from the "File" menu. A window will then appear that asks you for the "parameters" for the new dataset.

These parameters are the names of the input values that the tree will use to predict the output value, which is also given a name and specified in the "Data Set Parameter Input" window. Input values are specified with a comma separating them and the output value name must appear last. A trivial example is: input1, input2, input3, output. Click "Ok" when finished. Below is an example of what the create new dataset dialog box looks like.


image

To load a new dataset from a text file, click "Load From URL" from the "File" menu. Specify the address of a the file you would like to load. If the file is in a valid format, it will be loaded into the program.

A valid file contains one example per line, with the input values followed by the output value and separated by spaces, commas, or tabs. The training examples have an "A" and the test examples have a "B." Below is an example of valid format.


image

2.Loading a dataset from already existing examples.

The Decision Tree Learning Applet comes with several pre-defined examples to allow you to start working with datasets without having to create one yourself. To load an example file, go to the 'File' menu and select 'Load Sample Dataset'. A dialog box will open with a drop-down menu allowing you to select a particular example.

image

  • Mail Reading: This example models whether a user will read or not read an article based on whether or not the author is known, if its a new thread, the length, and where the user is.
     
  • Mail Reading (simplified): Same as the above example except that there is no input "Where Read".
     
  • Boolean Example: This is a simple example modeling the XOR logical operation.
     
  • Classification of Animals: This example classifies animals based on characteristics such as number of legs, whether or not it has hair, whether or not it has teeth...
     
  • All Electronics: This example models whether a user will buy a piece of electronics based on the user's age, income, if they are a student and their credit rating.
     
  • Small Car Database: This example models whether a car is acceptable or not to a user based on the price, maintenance cost, how many doors, how many people it can hold, trunk size and the safety rating.

  • Matching Pennies: This is data for predicting wins by observing some properties in game of matching pennies.

  • Likes TV: This predicts whether a person likes a TV program based on features of the TV program.

  • Holiday: This gives data that predicts whether a person likes a holiday.

Load the "Small Car Example." The next step is to create examples using the "View/Edit Examples" window.

View and Editing Examples

Click on the "View/Edit Example" button on the toolbar. A dialog box should come up that looks like the example below.


image

You can move around the examples in both the "Training Examples" and "Test Examples." Select All on the Test Examples side and click on the 'Remove' button to remove all Test Examples. Select 'Select % of Examples' to select a percentage of the examples. Select the first 50% of the training examples:


image

Click on the '--->' button to move the examples over the the Test Example side. The selected examples should now be on the Test Example side and you can proceed with building your decision tree. To learn how to build your decision tree, see Tutorial 2.

Valid HTML 4.0 Transitional