
[Points: 8; Issued: 2005/03/18; Deadline: 2005/04/28; Tutor:
Arian; Infohour: 2005/04/25, 13:0014:00, HSi13;
Einsichtnahme: 2005/05/16, 13:0014:00, HSi13; Download: pdf;
ps.gz]
This homework assignment asks you to apply crossvalidation to
determine the optimal number of hidden units in a two layer neural
network.
 Use the Pima Indians data set. The file diabetes.zip contains the data as the file
diabetes.mat . See the file
diabetesdescription.txt which is also contained in
diabetes.zip for more information on the data set. The
task is a classification problem.
 Initialize the random number generator using the Matlab
commands
rand('state',<MatrNmr>); and
randn('state',<MatrNmr>); .
 Split the data randomly into a training set (70) and a test set (30%) (use the function
randperm ).
Make sure that each set consists of an equal number of examples of
each class. Normalize the data with prestd .
 Perform a fold
crossvalidation ()
on the data set for a
two layer network with hidden units. Do this respectively for
. For
the training use the LevenbergMarquant (
trainlm )
algorithm and a method of your choice to avoid overfitting.
 Before network training set the weight and bias values to small
but nonzero values.
 Report the used training parameters
net.trainParam
and net.performParam
 Generate a diagram
showing the dependence of
on
, where
is the percentage
of incorrectly classified examples on the validation set in the
th iteration of the
cross validation.
 The optimal value for the number of hidden units
is that, for which
in diagram
is minimal.
Determine .
Explain the meaning of in the context of model selection, hypothesis
classes and generalization.
 Generate a diagram
showing the dependence of on , where is the classification error on the test set
of a network of
hidden units
trained on the training set as outlined above.
 The optimal value for the number of hidden units
is that, for which
in diagram
is minimal.
Determine .
Explain the difference between and , in particular the relevance of the test data
.
 Present your results clearly,
structured and legible. Document them in such a way that anybody
can reproduce them effortless.
 Please hand in the print out of the
Matlab program you have used.
