|
4 Training Neural Networks
In order for an ANN to be of any use, it is necessary to determine
appropriate values for all of the weights, biases and slopes. It is
obvious that there can be quite a large number of these. The number
of parameters for one layer of n neurons with m inputs is n*(m+2). It
is quite common for ANNs to have hundreds of weights.
Here we introduce 2 methods of optimising these parameters, otherwise
known as training the ANN.
4.1 Optimising by reducing an absolute error
The simplest (but not particularly effective) method for determining
the ANN parameters is called back-propagation (BP). To use BP one must
have many input-output sets, each of which is an example of how the
ANN is supposed to react. For each input the error is measured
between the desired output and the actual output. These errors are
used in a systematic way to adjust the parameters. Eventually a set
of parameters is found such that the actual output is close to the
desired output for the entire training set. Note that BP is a
numerical iterative approach, and suffers from problems of local
minimum. Thus it is important to randomise the parameters before
optimising, and to choose the best solution from repeating several runs.
There are other much more powerful optimisation algorithms such as
Conjugate Gradients and BFGS, the details of which may be found online
or in the Further Reading section.
One of the definining pre-requesites for this approach is that there
exists an absolute means of ranking potential solutions. In other
words, it is possible to summarise using a single number how good a
particular ANN is performing on a given problem.
4.2 Optimising using Genetic Algorithms
Generic Algorithms are an alternative approach to optimisation which
are motivated by Dawin's evolution by natural selection. In these
methods a population of competing ANNs is created initialised with
weights chosen at random. The ANN are then played against each other,
and some weakest percentage are removed from the population. The
surviving ANNs are then used to replenish the population, perhaps by
random changes (or mutation) and merging the parameters between two
ANN (cross-over). This process is then iterated until the improvements
flatten out.
One of the defining advantages of this approach is that it does not
require an absolute measure of the optimality of a particular
ANN. Instead all that is required is a means of comparing solutions,
and in particular of identifying the better solution. Of course, this
may be achieved using an absolute measure if one exists and is
reasonably easy to compute.
5 Further Reading
Neural Networks for Pattern Recognition. Christopher M. Bishop.
Clarendon press, Oxford, 1995.
An excellent mathematical treatment of neural networks in their
applications to pattern recognition. Focuses on MLP's, but also discusses
Radial Basis Function networks.
Neural Networks, a comprehensive foundation. Simon Haykin.
Prentice Hall
A more general treatment dealing with many different types of
artificial neural networks.
Back to top. >>>
|