
Generalization in MultiLayer Perceptrons
Introduction
This applet illustrates the generalization capabilities of the
multilayer perceptrons. It allows you to define two different sets
of data: one for training and the other for crossvalidation. The
two sets are necessary to study generalization in a systematic
manner.
Credits
The original applet was written by
Olivier Michel.
Instructions
Use the popup menu to choose learning points for training or
crossvalidation. The graph will display in black the error
on the training set and in
green the error on the
crossvalidation set.
Applet
Questions
For all questions except the last, leave the decay parameter zero.
 Easy problem: Set two simple clusters, a red one
(1's) and a blue one (0's), of training points linearly separable
and well distinct. Then, add crossvalidation points in each
cluster. To be realistic, the crossvalidation points should be of
the same color as training points in the same cluster. Run learning
for about 100 iteration and observe the resulting error graphs.
Could you comment on both errors ?
 More complicated problem: Use two similar simple
clusters, but set some crossvalidation points a little bit outside
the training clusters. Do you observe any change in the error
graphs? Why?
 Hard problem: Now, create two linearly separable
clusters, but very close to each other. Create crossvalidation
points and put some of the crossvalidation points slighly outside
the clusters, even inside the other cluster. Run the learning and
comment results. Did you observe that the error graph reaches a
minimum and then rise again ? How would you explain this?
 Nonlinearly separable problem: Set 3 blue
training points on the left hand side of the space, 6 red training
points in the middle and 3 blue training points on the right hand
side. Add 3 cross validation points in the first set, 6 in the
second and 3 in the last one. Change the number of hidden units and
the learning parameters if neccessary to obtain the convergence to
a null error on the training set. Can you observe a similar error
graph as in question 1 ? Why ?
 Getting more and more complicated: Try to solve
more complicated problems (e.g., similar to questions 2 and 3) with
nonlinearly seperable clusters.
 General questions: How would you characterize the
evolution of the error on a crossvalidation set ? How should a
training set be designed in order to get the best results ?
 Weight elimination algorithm: As discussed
in class, the decay parameter controls an extra term in the weight
update step. Set the decay parameter to a small value such as
.001 and use several (at least 4) units in the hidden layers.
(Don't forget to click Init each time you change any of the network
parameters.) Compare the training results with standard backprop
(decay=0.0).

