Computational Intelligence, SS08 2 VO 442.070 + 1 RU 708.070 last updated:
General
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Homework
 Assignments Scores Guidelines Archive
Exams
Animated Algorithms
Interactive Tests
Key Definitions
News
mailto:webmaster

# Homework 46: Backprop and Overfitting

[Points: 12.5; Issued: 2007/04/13; Deadline: 2007/05/08; Tutor: Roland Unterberger; Infohour: 2007/04/27, 15:15-16:15, HS i11; Einsichtnahme: 2007/05/25, 15:15-16:15, HS i11; Download: pdf; ps.gz]

# Applying Different Overfitting Methods [12.5 points]

Analyze two heuristics (early stopping and weight decay) to avoid overfitting for the training of multilayer neural networks with backpropagation.
1. Use the Boston Housing dataset `housing.mat` contained in the archive housing.zip See also `housing-description.txt` for more information on the data set.
2. Initialize the random number generator using the Matlab commands `rand('state',<MatrNmr>);` and `randn('state',<MatrNmr>);`.
3. Split the dataset randomly (a useful command is `randperm`) in a training set (50%), a validation set (25%) and a test set (25%). Normalize the data with `prestd`.
4. Train a two layer network with the Quasi-Newton method `trainbfg` and hidden units on the training set
1. without heuristics to avoid overfitting.
2. with early stopping (hand over the validation set to the function `train`).
3. with weight decay (use ```net.performFcn = 'msereg'``` and

`net.performParam.ratio = 0.5`).

Repeat these three points with . Use the default parameters and train for maximal 500 epochs.

5. Create a plot which shows for (a) - (c) the MSE of the trained networks on the training set in dependence on . Interpret the plot, explain the differences in the performance.
6. Create a plot which shows for (a) - (c) the MSE of the trained networks on the test set in dependence on .
7. Compare the plot of the error on the training and the test set. Can you see any qualitative differences? If yes, why?.
8. Interpret the plot for the error on the testset. How big is the benefit of each method? Which method seems to be most favorable. What are the advantages and disadvantages of each method? Could the dataset be used better for the weight decay heuristics?

## Hints

• Normalize the data using `prestd` and `trastd`.
• Before network training set the weight and bias values to small but nonzero values.
• Present your results clearly, structured and legible. Document them in such a way that anybody can easily reproduce them.
• Please hand in the print out of the Matlab program you have used.

# Comparing Weight Decay with Regularization Term and with the Additive Noise Heuristics [4 *points]

Analyze the behavior of weight decay with regularization term and with the additive noise heuristics. Train a two layer forward neural network with 10 hidden neurons. Use the `trainbfg` and the same training set as in assignment 3.1.
1. Use the additive noise heuristics to avoid overfitting. Create a new training data set by adding 3 noisy versions of the D to the training data. The additive noise should be drawn from the distribution `sigma * randn(size(D))`.
2. Create a plot for the performance on the training set (without noise) and on the test set with different settings of .
3. Create a plot for the performance of the weight decay with regularization term method with different settings of `net.performParam.ratio`.
4. Interpret the plots, are they qualitatively the same?