Compare different conjugate gradient and Quasi-Newton methods for a digit classification task. You are required to use MATLAB for this assignment.

- a)
Download the digit data set

`digits.zip`^{4}. The file`digits.mat`contains training samples (learn.X, learn.C) and test samples (test.X, test.C). Each sample consists of 64 pixel values in the range .

(use`d=reshape (learn.X(7,:),8,8)'; imagesc(d); colormap(1-gray);`to visualize training sample 7)Normalize the data using

`mapstd`or`prestd`and`trastd`.- b)
Initialize the MATLAB random number generator with

`rand('state',MatrNmr);`and`randn('state',MatrNmr);`(use only the MatrNmr of one team member).- c)
Use a conjugate gradient method of your choice (e.g.

`traincgf`) and a quasi-Newton method (e.g.`trainlm`, see`help nnet`for a list of available training functions) to train a network of your choice (choose an appropriate network architecture and activation functions), so that it achieves a good generalization capability on the test data. If necessary adjust the training parameters`net.trainParam`of the network (e.g. see`help traincgf`for a list of training parameters for`traincgf`). Avoid overfitting by means of a method of your choice (early stopping or weight decay). Use`train(net,X,T,[],[],V)`to hand over a validation set to the training algorithm and automatically activate early stopping.Choose an appropriate training and validation set from learn.X and learn.C. (

*Hint: Because of the huge size of the training set do not use all training samples to train the network.*)Before network training set the weight and bias values to small but nonzero values.

Justify and explain your choice for the nework parameters and training functions.

- d)
Compare the convergence speed of the different training methods for the network architecture obtained in c).