Tutorial 4: Algorithms
Algorithms and Theory
This is a very basic introduction to neural networks and the feedforward backpropagation algorithm.
A neural network (in the context of this applet) is a set of nodes that are connected to each other via edges. A node can only send information (usually numeric data) through an edge. A node sums up all its received "signals" and inputs it into an activation function, then sends the result to all its children. This signal is multiplied by a weight associated with an edge, then is received by the target node.
An activation function is simply a function that is used to introduce nonlinearity to the network. There are four functions supported by the applet: linear, sigmoid, exponential, and hyperbolic tangent. Note that all the functions are differentiable - this is because the backpropagation algorithm requires that activation functions have to be differentiable.
Nodes without parents are input nodes - the user must provide them with input. Nodes without children are output nodes. Everything else are hidden nodes. This applet deals with a feedforward neural network - it requires that the graph should be acyclic, and that a "layer" (a set of nodes that have the same depth) should be totally connected to the layer below it. This means that each node should have an edge going to all the nodes of the layer below it, and nowhere else.
This network is trained by an algorithm called the backpropagation algorithm. The backprop algorithm is essentially a minimization technique that minimizes the error of a neural network.
The Feedforward Backpropagation Algorithm
The Neural Network applet demonstrates a widely-used algorithm called the Backpropagation algorithm. To train a neural network, a set of training examples is fed into the network. Each example will produce an output that may be different from the expected result. This error (usually sum of squares error) is "backpropagated" through edges and hidden nodes; the magnitude of this error is used to determine how much to adjust the weights, and in what direction. An epoch, or iteration, is a whole training set fed into a neural network in this way.
Caveats and Warnings
If the network doesn't seem to learn, maybe the learning rate is too high. Adjust the learning rate to something smaller.
Be VERY careful with activation functions. Remember that the sigmoid function only has a range [0,1] and the hyperbolic tangent has a range [-1,1]. Set your output node activation functions to linear if the outputs are not within these ranges.
The backpropagation algorithm as it is implemented here is very vulnerable to numerical errors. This is because the total sum of squares error can be large if the training set is large. The algorithm requires that this error be fed into the derivative of the activation function. As the most common activation functions involve exponential functions, this leads to numerical errors.
The linear activation function does not suffer from this problem, but since it is not a "squashing" function (the sigmoid and hyperbolic tangent has finite ranges), it also has numerical problems with large training sets as all the sum of squares errors add up to large numbers. To work around this, set the learning rate to something small (less than .005). This will usually bring the computation to a manageable level.
|Main Tools: Graph Searching | Consistency for CSP | SLS for CSP | Deduction | Belief and Decision Networks | Decision Trees | Neural Networks | STRIPS to CSP|