AIspace

General Help

Neural Networks

Overview

Menu Help

Neural Networks Wizard

Training and Test Sets

Create Mode

Solve Mode

Algorithms

DTD for the XML representation

Bibliography

Overview

The purpose of the Neural Network applet is to visually demonstrate the feedforward backpropagation algorithm. There is visual feedback for weight adjustments and error analysis. The Neural Network applet features support for graphical modification and creation of neural networks. It allows for separate training and test sets, where the network is trained by the training set, and the test set is a "control". Also, it has a "Construction Wizard" that allows the applet to load plain comma-delimited text files as data, and construct an appropriate neural network for it.

Menu Help

The File Menu

The File Menu has options to create graphs, load files, and save files, as well as quitting the program.

Notes on loading files:

It is possible to either load 'Graph and Data' or just 'Data'. Loading 'Graph and Data' will load graph information from a standard neural network file, along with its corresponding data. Loading 'Data' opens the Construction Wizard to generate a neural network automatically. Loading 'Data' can load data from both plain comma-delimited text files and standard neural network files (discarding the graph information), while loading 'Graph and Data' can only open the standard neural network files.

Create New Graph - clears the currently loaded network. All changes will be lost.
Load Sample Graph and Data - allows the user to load from a selection of pregenerated examples.
Load Sample Data - allows the user to use the Construction Wizard to load from a selection of pregenerated examples.
Load Graph and Data from URL - allows the user to load a file over the Internet by typing in a URL.
Load Data from URL - allows the user to use the Construction Wizard to data load a file by typing in a URL.
Save Graph - Saves the graph. The default format is xml.
Print - Prints the graph.
Quit - Kills the applet.

The Edit Menu

The Edit Menu allows the user to view a text representation of the neural network.

View Prolog Code - displays the prolog code that can be used to represent the network. It should be noted that the applet does not use this code; it uses an analytical Java implementation of backpropagation.
View/Edit Text Representation - displays the text representation of the currently loaded neural network.
View/Edit XML Representation - displays the XML representation of the currently loaded neural network. The DTD is defined here.

The View Menu

The View Menu allows the user to modify the appearance of the applet.

Font Size - Set the font size.
Line Width - Set the width of lines displayed.
Autoscale - Adjust the graph to be fitted in the canvas.
Pan/Zoom - Select which mode to be in. Then when right-clicking and dragging the mouse on canvas, you can pan or zoom on the canvas.
Reset Label - edge labels (which contain unification information) can be moved separately from the edge it is associated with. Resetting edge labels snaps labels back to their associated edges.
Enable Anti-Aliasing - enable or disable anti-aliasing.
Show Message Panel - Show or hide the message prompts above the main canvas.
Show Button Text - Show or hide the text on the buttons in the toolbar.
Show Buttons - Show or hide the removable toolbar buttons.
Show Parameters - hides/displays the weights in the edges and nodes.

The Neural Options Menu

The Neural Options Menu allows the user to modify the parameters of the backpropagation.

Auto Step Speed - sets the speed of the applet when multiple iterations of the backpropagation are performed. For large data sets, set the speed to 0.
Parameter Initialization Options - sets how the parameter values should be reset when initialized. They can either be set to a constant value or set to a random value within a bound.
Learning Options
- - Learning Rate - this constant is multiplied into the update value for backprop. A higher rate MAY allow for faster convergence, but usually it is a good idea to keep this small. Because of numerical issues, it should be set to really small number (at least .005) when using linear activation functions anywhere in the network. The default value is 1.
- - Momentum - this is the fraction of the previous iteration that is added onto the new iteration. This is to make the algorithm converge faster. A value of 0 means no momentum, and this is the default.
Stopping Conditions
- - Number of Iterations - the number of iterations that will be executed when the Step X button is pressed. The default is 50.
- - Target Training Error - this is the target error that the algorithm will stop at when the Run Until Finished button is pressed. Keep in mind that the network may not reach this error value (if the learning rate is too high, for example). Also, the more examples, the higher the error value is (as this is the total error, not average). The default is 0.1.
Normalize Inputs - normalizes the inputs so that they are within the range [-1,1]. The default is on.

The Construction Wizard

The Neural Network Construction Wizard is designed to automate the creation of neural networks from raw, comma-delimited data. The only requirement is that the text file must start with a line defining the categories of the data. This line has to be of the form T:[category1],[category2], ..., [categoryN]; , in the same order as the data. For example, this is the first two lines of a data file for the Wizard:

T: price, maint-cost, doors, persons, trunk-size, safety, acceptable;

vhigh, vhigh, 2, 2, small, low, unacc

The Wizard can also load the normal applet data files, but it will ignore the graph information and just load the examples.

Once the file is loaded, the Wizard dialog will pop up and query the user for information on the neural network that he/she wants to build. The user should input the number of hidden layers needed, and the number of nodes for a specific hidden layer. Hidden layers can be selected using the pull-down choice menu. The number of nodes default to 2.

The user has to choose which categories are outputs. Depress the radio button to the left of the category name to make it an output. Input categories become input nodes, and output categories become output nodes.

Also, it may be necessary for some non-numerical categories to be declared as "ordered" by depressing the corresponding checkbox beside the category name. What this means is that this category can be represented as a continuum of numbers. The Wizard will prompt for value mappings for each element of the category. For example, the category "University" with members "SFU, UBC, UVic" cannot be represented as such, but the category "Rating" with members "Low, Medium, High" can be (one can map them as numbers 0, .5, and 1). Numerical categories are already ordered, and hence are not affected if they are declared as ordered.

Once all mappings have been declared, the Wizard will create the specified neural network. Also, it will distribute the data evenly into the training and test sets.

Training and Test Sets

The neural network applet uses two sets of data for the network: the training set and the test set. The training set is the set of examples that are used to train the neural network. The test set is a "control" which allows the user to observe how the network can generalize from the training set to other data. The applet graphs both training and test set errors in the Plot Window, and the user can get more detailed statistics for the test set from the Summary Statistics window.

Create Mode

Create Mode allows the user to create a neural network manually. Click on a button on the toolbar to enable its function. Only one button can be depressed at a time. For best results, the neural network should be totally connected; a node should be connected to all the nodes of the layer below it.

Create Node - creates a node. Click on the canvas to create a node. Note that the applet automatically detects whether a node is an input, output, or hidden node.
Create Edge - creates an edge. Click on the parent node, then on the child node to create a directed edge between them.
Select - selects a node or an edge. Click on a node or an edge to select it and move it around.
Delete - deletes nodes or edges. Click on a node or an edge to delete it.
Set Properties - allows the user to specify starting weights for edges and nodes, and to specify what activation function a node will use. The available activation functions are sigmoid or logistic, exponential, linear, and hyperbolic tangent. It is recommended that the user use sigmoid activation functions for most purposes. The Algorithms and Theory section will discuss this in more detail.
View/Edit Examples - shows the Edit Examples window.

Solve Mode

Solve mode allows the user to train the neural network that is in memory. It also allows the user to observe the training, and to add or delete examples from the training or test sets. Finally, the user can also access statistics on the network through solve mode.

Initialize Parameters - initializes the weights of the neural network with the options specified in the Parameter Initialization Options dialog. This is equivalent to clearing the learning the neural network has so far and starting from scratch.
Step - does one iteration of the backpropagation algorithm.
Step X - does X iterations of the backpropagation algorithm, with X being the number of iterations specified in the Stopping Conditions dialog.
Step To Target Error - runs the backpropagation algorithm until the total sum of squares error is equivalent to the target error specified in the Stopping Conditions dialog.
Stop Search - stops the backprop algorithm.
Calculate Output - pulls up a dialog that allows the user to enter input values and calculate the output values that the neural network would generate.
Show Plot - shows the Plot window.
Summary Statistics - shows the Summary Statistics window.
View/Edit Examples - shows the Edit Examples window.

The Plot Window

The plot window shows a graph of the error of the neural network. As the backpropagation algorithm is run, the error should be minimized. The plot window also has Initialize Parameter, Step, Step X, Step to Target Error, and Stop buttons, which work in the same way as their solve mode counterparts. In addition, there are buttons to close, redraw, clear, and print the plot window. There is also a checkbox to switch between logarithmic and standard display modes.

The error values of the training and the test sets are displayed on the right side of the plot. The blue plot is the training set error, while the orange plot is the test set error.

The Edit Examples Window

The Edit Examples window displays all the examples in both the training test sets, and allows the user to add and remove examples, and switch examples between the training and test sets.

To select an example, click on the example. To select multiple examples, keep clicking on the desired examples until all of them are selected. One can also pull down the Select... choice box and choose between Select All, Select None, and Select Percentage.

To transfer an example to another set, click the appropriate arrow button on the dialog.

The Summary Statistics Window

The Summary Statistics Window displays statistics for the test set, and as such, can only be pulled up once there is at least one example in the test set. The window displays all the test examples as a table, as well as the predicted value. It classifies the examples as correct or incorrect depending on a classification range which is determined by the user. This "threshold" defaults to .5. The window also gives a percentage correct or incorrect, and also allows the user to select which output's predicted value is displayed in the table.

There is a color key at the bottom of the applet window. Red is a weight less than 0, and green is a weight greater than 0. Zero weight results in a clear edge, and the color becomes darker as the weight gets larger.

There is a message at the top of the canvas that cues the user as to what the applet is doing as it is running.

Algorithms and Theory

This is a very basic introduction to neural networks and the feedforward backpropagation algorithm.

A neural network (in the context of this applet) is a set of nodes that are connected to each other via edges. A node can only send information (usually numeric data) through an edge. A node sums up all its received "signals" and inputs it into an activation function, then sends the result to all its children. This signal is multiplied by a weight associated with an edge, then is received by the target node.

An activation function is simply a function that is used to introduce nonlinearity to the network. There are four functions supported by the applet: linear, sigmoid, exponential, and hyperbolic tangent. Note that all the functions are differentiable - this is because the backpropagation algorithm requires that activation functions have to be differentiable.

A linear activation function is equivalent to having no activation function at all (ie the sum of the "signals" is the result sent to a node's children).
A sigmoid function (also called the logistic) is an S-shaped curve that maps all input to [0,1]. It has a limit of 0 as x approaches negative infinity, and 1 as x approaches infinity.
A hyperbolic tangent function is similar to a sigmoid, but it maps all of its input to [-1,1]. It has a limit of -1 as x approaches negarive infinity, and 1 as x approaches infinity.
An exponential function is the exponential function e^x.

Nodes without parents are input nodes - the user must provide them with input. Nodes without children are output nodes. Everything else are hidden nodes. This applet deals with a feedforward neural network - it requires that the graph should be acyclic, and that a "layer" (a set of nodes that have the same depth) should be totally connected to the layer below it. This means that each node should have an edge going to all the nodes of the layer below it, and nowhere else.

This network is trained by an algorithm called the backpropagation algorithm. The backprop algorithm is essentially a minimization technique that minimizes the error of a neural network.

The Feedforward Backpropagation Algorithm

The Neural Network applet demonstrates a widely-used algorithm called the Backpropagation algorithm. To train a neural network, a set of training examples is fed into the network. Each example will produce an output that may be different from the expected result. This error (usually sum of squares error) is "backpropagated" through edges and hidden nodes; the magnitude of this error is used to determine how much to adjust the weights, and in what direction. An epoch, or iteration, is a whole training set fed into a neural network in this way.

Caveats and Warnings

If the network doesn't seem to learn, maybe the learning rate is too high. Adjust the learning rate to something smaller.

Be VERY careful with activation functions. Remember that the sigmoid function only has a range [0,1] and the hyperbolic tangent has a range [-1,1]. Set your output node activation functions to linear if the outputs are not within these ranges.

The backpropagation algorithm as it is implemented here is very vulnerable to numerical errors. This is because the total sum of squares error can be large if the training set is large. The algorithm requires that this error be fed into the derivative of the activation function. As the most common activation functions involve exponential functions, this leads to numerical errors.

The linear activation function does not suffer from this problem, but since it is not a "squashing" function (the sigmoid and hyperbolic tangent has finite ranges), it also has numerical problems with large training sets as all the sum of squares errors add up to large numbers. To work around this, set the learning rate to something small (less than .005). This will usually bring the computation to a manageable level.

DTD Definition

<!DOCTYPE MLDBIF [
        <!ELEMENT MLDBIF ( DB ) >
        <!ELEMENT DB ( NETWORK, EXAMPLES ) >

        <!ELEMENT EXAMPLES ( PARAMETER+, EXAMPLE+ ) >
        <!ELEMENT PARAMETER ( #PCDATA ) >
                  <!ATTLIST PARAMETER type NMTOKEN #REQUIRED >
        <!ELEMENT EXAMPLE ( VALUE+ ) >
                  <!ATTLIST EXAMPLE type NMTOKEN #REQUIRED >
        <!ELEMENT VALUE ( #PCDATA ) >
                  <!ATTLIST VALUE parameter CDATA #REQUIRED >                  

        <!ELEMENT NETWORK ( NODE+, EDGE+ ) >
        <!ELEMENT NODE ( NAME, WEIGHT, XPOS, YPOS, INDEX, FUNCTION ) >
        <!ELEMENT NAME ( #PCDATA ) >
        <!ELEMENT WEIGHT ( #PCDATA ) >
        <!ELEMENT FUNCTION ( #PCDATA ) >
        <!ELEMENT INDEX ( #PCDATA ) >
        <!ELEMENT XPOS ( #PCDATA ) >
        <!ELEMENT YPOS ( #PCDATA ) >
                                        
        <!ELEMENT EDGE ( STARTINDEX, ENDINDEX, WEIGHT ) >
        <!ELEMENT STARTINDEX ( #PCDATA ) >
        <!ELEMENT ENDINDEX ( #PCDATA ) >        
]>

This is an example of the Neural Network XML (the Boolean example from the applet):


<?xml version="1.0" ?>
<MLDBIF>
<DB>

<!-- Neural Network Definition -->
<NETWORK>

   <!-- Node Definitions -->

   <NODE>
      <NAME>Input 1</NAME>
      <WEIGHT>0.0</WEIGHT>
      <XPOS>-121.04071</XPOS>
      <YPOS>-91.9425</YPOS>
      <INDEX>0</INDEX>
      <FUNCTION>sigmoid</FUNCTION>
   </NODE>
   <NODE>
      <NAME>Input 2</NAME>
      <WEIGHT>0.0</WEIGHT>
      <XPOS>118.37389</XPOS>
      <YPOS>-90.12185</YPOS>
      <INDEX>1</INDEX>
      <FUNCTION>sigmoid</FUNCTION>
   </NODE>
   <NODE>
      <NAME>Output (and)</NAME>
      <WEIGHT>0.1</WEIGHT>
      <XPOS>-184.50099</XPOS>
      <YPOS>91.16629</YPOS>
      <INDEX>2</INDEX>
      <FUNCTION>sigmoid</FUNCTION>
   </NODE>
   <NODE>
      <NAME>Output (or)</NAME>
      <WEIGHT>0.2</WEIGHT>
      <XPOS>2.7630477</XPOS>
      <YPOS>91.9425</YPOS>
      <INDEX>3</INDEX>
      <FUNCTION>sigmoid</FUNCTION>
   </NODE>
   <NODE>
      <NAME>Output (nor)</NAME>
      <WEIGHT>0.3</WEIGHT>
      <XPOS>185.50098</XPOS>
      <YPOS>91.16629</YPOS>
      <INDEX>4</INDEX>
      <FUNCTION>sigmoid</FUNCTION>
   </NODE>

   <!-- Edge Definitions -->

   <EDGE>
      <STARTINDEX>0</STARTINDEX>
      <ENDINDEX>2</ENDINDEX>
      <WEIGHT>0.4</WEIGHT>
   </EDGE>
   <EDGE>
      <STARTINDEX>0</STARTINDEX>
      <ENDINDEX>3</ENDINDEX>
      <WEIGHT>0.1</WEIGHT>
   </EDGE>
   <EDGE>
      <STARTINDEX>0</STARTINDEX>
      <ENDINDEX>4</ENDINDEX>
      <WEIGHT>0.2</WEIGHT>
   </EDGE>
   <EDGE>
      <STARTINDEX>1</STARTINDEX>
      <ENDINDEX>2</ENDINDEX>
      <WEIGHT>0.3</WEIGHT>
   </EDGE>
   <EDGE>
      <STARTINDEX>1</STARTINDEX>
      <ENDINDEX>3</ENDINDEX>
      <WEIGHT>0.4</WEIGHT>
   </EDGE>
   <EDGE>
      <STARTINDEX>1</STARTINDEX>
      <ENDINDEX>4</ENDINDEX>
      <WEIGHT>0.5</WEIGHT>
   </EDGE>

</NETWORK>

<!-- Example Database -->
<EXAMPLES>

   <!-- Parameter Definition -->
   <PARAMETER type="input">Input 1</PARAMETER>
   <PARAMETER type="input">Input 2</PARAMETER>
   <PARAMETER type="output">Output (and)</PARAMETER>
   <PARAMETER type="output">Output (or)</PARAMETER>
   <PARAMETER type="output">Output (nor)</PARAMETER>

   <!-- Examples -->
   <EXAMPLE type="training">
      <VALUE parameter="Input 1">0.0</VALUE>
      <VALUE parameter="Input 2">0.0</VALUE>
      <VALUE parameter="Output (and)">0.0</VALUE>
      <VALUE parameter="Output (or)">0.0</VALUE>
      <VALUE parameter="Output (nor)">1.0</VALUE>
   </EXAMPLE>
   <EXAMPLE type="training">
      <VALUE parameter="Input 1">0.0</VALUE>
      <VALUE parameter="Input 2">1.0</VALUE>
      <VALUE parameter="Output (and)">0.0</VALUE>
      <VALUE parameter="Output (or)">1.0</VALUE>
      <VALUE parameter="Output (nor)">0.0</VALUE>
   </EXAMPLE>
   <EXAMPLE type="training">
      <VALUE parameter="Input 1">1.0</VALUE>
      <VALUE parameter="Input 2">0.0</VALUE>
      <VALUE parameter="Output (and)">0.0</VALUE>
      <VALUE parameter="Output (or)">1.0</VALUE>
      <VALUE parameter="Output (nor)">0.0</VALUE>
   </EXAMPLE>
   <EXAMPLE type="training">
      <VALUE parameter="Input 1">1.0</VALUE>
      <VALUE parameter="Input 2">1.0</VALUE>
      <VALUE parameter="Output (and)">1.0</VALUE>
      <VALUE parameter="Output (or)">1.0</VALUE>
      <VALUE parameter="Output (nor)">0.0</VALUE>
   </EXAMPLE>
</EXAMPLES>

</DB>
</MLDBIF>

Bibliography

Mitchell, Tom. 1997. Machine Learning.
Sarle, W.S., ed. (1997), Neural Network FAQ, periodic posting to the Usenet newsgroup comp.ai.neural-nets, URL: ftp://ftp.sas.com/pub/neural/FAQ.html
David Poole, Alan Mackworth, Randy Goebel. Computational Intelligence: A Logical Approach.

Contents