Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Animated Algorithms
Interactive Tests
Key Definitions
Literature and Links

Homework 46: Backprop and Overfitting

[Points: 12.5; Issued: 2007/04/13; Deadline: 2007/05/08; Tutor: Roland Unterberger; Infohour: 2007/04/27, 15:15-16:15, HS i11; Einsichtnahme: 2007/05/25, 15:15-16:15, HS i11; Download: pdf; ps.gz]

Applying Different Overfitting Methods [12.5 points]

Analyze two heuristics (early stopping and weight decay) to avoid overfitting for the training of multilayer neural networks with backpropagation.
  1. Use the Boston Housing dataset housing.mat contained in the archive See also housing-description.txt for more information on the data set.
  2. Initialize the random number generator using the Matlab commands rand('state',<MatrNmr>); and randn('state',<MatrNmr>);.
  3. Split the dataset randomly (a useful command is randperm) in a training set $ D$ (50%), a validation set $ V$ (25%) and a test set $ T$ (25%). Normalize the data with prestd.
  4. Train a two layer network with the Quasi-Newton method trainbfg and $ n_H$ hidden units on the training set $ D$
    1. without heuristics to avoid overfitting.
    2. with early stopping (hand over the validation set $ V$ to the function train).
    3. with weight decay (use net.performFcn = 'msereg' and

      net.performParam.ratio = 0.5).

    Repeat these three points with $ n_H = 1,2,4,8,10$. Use the default parameters and train for maximal 500 epochs.

  5. Create a plot which shows for (a) - (c) the MSE of the trained networks on the training set $ D$ in dependence on $ n_H$. Interpret the plot, explain the differences in the performance.
  6. Create a plot which shows for (a) - (c) the MSE of the trained networks on the test set $ T$ in dependence on $ n_H$.
  7. Compare the plot of the error on the training and the test set. Can you see any qualitative differences? If yes, why?.
  8. Interpret the plot for the error on the testset. How big is the benefit of each method? Which method seems to be most favorable. What are the advantages and disadvantages of each method? Could the dataset be used better for the weight decay heuristics?


  • Normalize the data using prestd and trastd.
  • Before network training set the weight and bias values to small but nonzero values.
  • Present your results clearly, structured and legible. Document them in such a way that anybody can easily reproduce them.
  • Please hand in the print out of the Matlab program you have used.

Comparing Weight Decay with Regularization Term and with the Additive Noise Heuristics [4 *points]

Analyze the behavior of weight decay with regularization term and with the additive noise heuristics. Train a two layer forward neural network with 10 hidden neurons. Use the trainbfg and the same training set as in assignment 3.1.
  1. Use the additive noise heuristics to avoid overfitting. Create a new training data set by adding 3 noisy versions of the D to the training data. The additive noise should be drawn from the distribution sigma * randn(size(D)).
  2. Create a plot for the performance on the training set (without noise) and on the test set with different settings of $ sigma$.
  3. Create a plot for the performance of the weight decay with regularization term method with different settings of net.performParam.ratio.
  4. Interpret the plots, are they qualitatively the same?