Generalization in Multi-Layer Perceptrons
This applet illustrates the generalization capabilities of the
multi-layer perceptrons. It allows you to define two different sets
of data: one for training and the other for cross-validation. The
two sets are necessary to study generalization in a systematic
The original applet was written by
Use the popup menu to choose learning points for training or
cross-validation. The graph will display in black the error
on the training set and in
green the error on the
For all questions except the last, leave the decay parameter zero.
- Easy problem: Set two simple clusters, a red one
(1's) and a blue one (0's), of training points linearly separable
and well distinct. Then, add cross-validation points in each
cluster. To be realistic, the cross-validation points should be of
the same color as training points in the same cluster. Run learning
for about 100 iteration and observe the resulting error graphs.
Could you comment on both errors ?
- More complicated problem: Use two similar simple
clusters, but set some cross-validation points a little bit outside
the training clusters. Do you observe any change in the error
- Hard problem: Now, create two linearly separable
clusters, but very close to each other. Create cross-validation
points and put some of the cross-validation points slighly outside
the clusters, even inside the other cluster. Run the learning and
comment results. Did you observe that the error graph reaches a
minimum and then rise again ? How would you explain this?
- Non-linearly separable problem: Set 3 blue
training points on the left hand side of the space, 6 red training
points in the middle and 3 blue training points on the right hand
side. Add 3 cross validation points in the first set, 6 in the
second and 3 in the last one. Change the number of hidden units and
the learning parameters if neccessary to obtain the convergence to
a null error on the training set. Can you observe a similar error
graph as in question 1 ? Why ?
- Getting more and more complicated: Try to solve
more complicated problems (e.g., similar to questions 2 and 3) with
non-linearly seperable clusters.
- General questions: How would you characterize the
evolution of the error on a cross-validation set ? How should a
training set be designed in order to get the best results ?
- Weight elimination algorithm: As discussed
in class, the decay parameter controls an extra term in the weight
update step. Set the decay parameter to a small value such as
.001 and use several (at least 4) units in the hidden layers.
(Don't forget to click Init each time you change any of the network
parameters.) Compare the training results with standard backprop