Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Animated Algorithms
Interactive Tests
Key Definitions
Literature and Links

Homework 37: Cross-validation and overfitting

[Points: 12.5; Issued: 2006/03/22; Deadline: 2006/05/10; Tutor: Armend Zeqiraj; Infohour: 2006/05/08, 13:00-14:00, HSi13; Einsichtnahme: 2006/05/22, 13:00-14:00, HSi13; Download: pdf; ps.gz]

Analyse heuristics to avoid overfitting for the training of multilayer neural networks with backpropagation.

  1. Use the Boston Housing dataset housing.mat contained in the archive See also housing-description.txt for more information on the data set.
  2. Initialize the random number generator using the Matlab commands rand('state',<MatrNmr>); and randn('state',<MatrNmr>);.
  3. Split the dataset randomly (a useful command is randperm) in a training set $ D$ (75%) and a validation set for early stopping $ V$ (25%).
  4. Perform a $ k$-fold cross-validation ($ k=10$) on the data set $ D$ for a two layer network with $ n_H$ hidden units. Train the network with the Quasi-Newton method trainbfg
    1. without heuristics to avoid overfitting.
    2. by adding to the training data $ D_i$ in the $ i$-th iteration of the cross validation 3 noisy versions of $ D_i$. No noise should be added to the validation data of the cross-validation. Noise should be drawn from a normal distribution with mean 0 and a standard deviation of 0.1 (use 0.1*randn(size(Di)) to generate a noisy version of $ D_i$).
    3. with early stopping (hand over the validation set $ V$ to the function train).
    4. with weight decay (use net.performFcn = 'msereg' and

      net.performParam.ratio = 0.5).

    Repeat these four points with $ n_H = 1,2,4,8,10$. Use the default parameters and train for maximal 500 epochs.

  5. Create a plot which shows for (a) - (d) the dependence of $ E_{xval} = \frac{1}{k}\sum_{i=1}^{k}e_i$ on $ n_H$, where $ e_i$ is the mse performance on the validation set in the $ i$-th iteration of the cross validation.
  6. Interpret the plot. How big is the benefit of each method? Which method seems to be most favorable. What are the advantages and disadvantages of each method? Could the dataset be used better for the weight decay heuristics?
  7. Hand in the first 10 elements of the data sets $ D$ and $ V$.


  • Normalize the data using prestd.
  • Before network training set the weight and bias values to small but nonzero values.
  • Present your results clearly, structured and legible. Document them in such a way that anybody can easily reproduce them.
  • Please hand in the print out of the Matlab program you have used.