Computational Intelligence, SS08 2 VO 442.070 + 1 RU 708.070 last updated:
General
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Homework
 Assignments Scores Guidelines Archive
Exams
Animated Algorithms
Interactive Tests
Key Definitions
News
mailto:webmaster

# Homework 17: Backprop and cross-validation

[Points: 8; Issued: 2004/05/13; Deadline: 2004/05/26; Tutor: Armend Zeqiraj; Infohour: 2004/05/24, 12:00-13:00, Seminarraum IGI; Einsichtnahme: 2004/06/14, 12:00-13:00, Seminarraum IGI; Download: pdf; ps.gz]

This homework assignment asks you to apply cross-validation to determine the optimal number of hidden units in a two layer neural network.

1. Use the Pima Indians data set. The file diabetes.zip contains the data as the file diabetes.mat. See the file diabetes-description.txt which is also contained in diabetes.zip for more information on the data set. The task is a classification problem.
2. Initialize the random number generator using the Matlab commands rand('state',<MatrNmr>); and randn('state',<MatrNmr>);.
3. Split the data randomly into a training set (70) and a test set (30%) (use the function randperm). Make sure that each set consists of an equal number of examples of each class. Normalize the data with prestd.
4. Perform a -fold cross-validation () on the data set for a two layer network with hidden units. Do this respectively for . For the training use the Levenberg-Marquant (trainlm) algorithm and a method of your choice to avoid overfitting.
5. Before network training set the weight and bias values to small but nonzero values.
6. Report the used training parameters net.trainParam and net.performParam
7. Generate a diagram showing the dependence of on , where is the percentage of incorrectly classified examples on the validation set in the -th iteration of the cross validation.
8. The optimal value for the number of hidden units is that, for which in diagram is minimal. Determine . Explain the meaning of in the context of model selection, hypothesis classes and generalization.
9. Generate a diagram showing the dependence of on , where is the classification error on the test set of a network of hidden units trained on the training set as outlined above.
10. The optimal value for the number of hidden units is that, for which in diagram is minimal. Determine . Explain the difference between and , in particular the relevance of the test data .

### Remarks

• Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless.
• Please hand in the print out of the Matlab program you have used.