|
2.3 Multi-layer perceptrons
The feed-forward ANN described above is also known as a Multi-Layer
Perceptron (MLP). (The name "perceptron" was coined by Rosenblatt in
a 1943 paper.) The MLP is capable of approximating any mathematical
function if there are a sufficient number of neurons. This refers to
the strict definition of a function, which is a rule that specifies
the outputs given any inputs within a specified domain. For example
if x and y are inputs and z is an output, then the function f(x,y) =
sin(x)/exp(z) over the domain where -2 < x < 2, -3 < z < 3 could be
approximated arbitrarily closely by an MLP, given sufficient neurons.
2.3.1 The MLP formulae in matrix notation
We now present the formulae of an MLP using a matrix notation. This
section may be skipped without loss of continuity.
Consider the mapping from the MLP inputs to the final inputs to each
hidden layer sigmoid. Write the vector of inputs as x, and the vector
of hidden layer sigmoid inputs as y1. Note that the mapping from x to
y1 is a linear transformation:
y1 = W*x + C
where W is the matrix of weights and C is the vector of offsets (or
biases). Because there is another linear transformation (between the
hidden nodes and the output nodes), I will write this as:
y1 = W2*x + C2
The next step in the flow through the network is to apply the sigmoid
non-linearities to each of the element in the vector y1. Because this
function application is purely element-wise, it is convenient to write
this simply as:
f(y1), where f(x) = 1/(1+exp(-x))
--- the sigmoid function
To make sure I'm clear using an example, if y1 = [1, 2]' (a vector
containing 2 numbers - the ' meaning transpose i.e. this is actually a
column vector rather than a row vector), then
f([1 2]') means [f(1) f(2)]'
Clearly this type of element-wise notation preserves the dimensions of
its input.
Returning to the MLP, the output of the hidden nodes is the vector
y2 = f(y1)
This vector then undergoes another linear transformation in the same
way as for the input. Thus the input to the output layer sigmoids is
y3 = W1*y2 + C1
Lastly, applying the output sigmoids we reach the output of the MLP
z = f(y3)
In summary, the pipeline can be drawn (view with proportional font):
x -> linear map -> y1 -> sigmoids -> y2 -> linear map -> y3 -> sigmoids -> z
W2, C2
W1,C1
where x, y1, y2, y3, and z are all vectors and the sigmoid functions
are applied element-wise (element-wise functions are sometimes called
universal functions).
Eliminating y1, y2, and y3 results in the reasonably compact notation:
z = f(W1 * f(W2*x + C2) + C1)
`----y2----'
where to emphasize the structure I have tried to show y2, the output
from the hidden layer nodes.
As a final "checksum", if there are n_in inputs, n_hidden hidden
nodes, and n_out output nodes, then:
W2 is a matrix with n_in columns and n_hidden rows
C2 is a column vector with n_hidden rows
W1 is a matrix with n_hidden columns and n_out rows
C1 is a column vector with n_out rows.
2.4 Comparison with biological networks (and turing machines)
It is important to remember that an MLP is a very simple model of a
biological neural network. In fact in many ways it is irrelevant that
it has a biologically motivated background, and may simply be thought
of as a general purpose mathematical function.
Two important ways in which an MLP is much simpler than a biological
network:
- no temporal effects,
- no maintained state.
It is the lack of these two properties which means that an MLP cannot
have feedback (unlike a biological network). Similarly, the lack of
these properties means that the MLP is not a Turing machine, and
cannot be a general purpose computer. It has no RAM! The values of
the weights can be thought of as values in ROM.
Back to top. >>>
Continue with the tutorial. >>>
|