4 Training Neural Networks

In order for an ANN to be of any use, it is necessary to determine appropriate values for all of the weights, biases and slopes. It is obvious that there can be quite a large number of these. The number of parameters for one layer of n neurons with m inputs is n*(m+2). It is quite common for ANNs to have hundreds of weights.

Here we introduce 2 methods of optimising these parameters, otherwise known as training the ANN.

4.1 Optimising by reducing an absolute error

The simplest (but not particularly effective) method for determining the ANN parameters is called back-propagation (BP). To use BP one must have many input-output sets, each of which is an example of how the ANN is supposed to react. For each input the error is measured between the desired output and the actual output. These errors are used in a systematic way to adjust the parameters. Eventually a set of parameters is found such that the actual output is close to the desired output for the entire training set. Note that BP is a numerical iterative approach, and suffers from problems of local minimum. Thus it is important to randomise the parameters before optimising, and to choose the best solution from repeating several runs.

There are other much more powerful optimisation algorithms such as Conjugate Gradients and BFGS, the details of which may be found online or in the Further Reading section.

One of the definining pre-requesites for this approach is that there exists an absolute means of ranking potential solutions. In other words, it is possible to summarise using a single number how good a particular ANN is performing on a given problem.

4.2 Optimising using Genetic Algorithms

Generic Algorithms are an alternative approach to optimisation which are motivated by Dawin's evolution by natural selection. In these methods a population of competing ANNs is created initialised with weights chosen at random. The ANN are then played against each other, and some weakest percentage are removed from the population. The surviving ANNs are then used to replenish the population, perhaps by random changes (or mutation) and merging the parameters between two ANN (cross-over). This process is then iterated until the improvements flatten out.

One of the defining advantages of this approach is that it does not require an absolute measure of the optimality of a particular ANN. Instead all that is required is a means of comparing solutions, and in particular of identifying the better solution. Of course, this may be achieved using an absolute measure if one exists and is reasonably easy to compute.

5 Further Reading

  1. Neural Networks for Pattern Recognition. Christopher M. Bishop. Clarendon press, Oxford, 1995.

    An excellent mathematical treatment of neural networks in their applications to pattern recognition. Focuses on MLP's, but also discusses Radial Basis Function networks.

  2. Neural Networks, a comprehensive foundation. Simon Haykin. Prentice Hall

    A more general treatment dealing with many different types of artificial neural networks.

Back to top. >>>