Computational Intelligence, SS08 2 VO 442.070 + 1 RU 708.070 last updated:
General
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Homework
 Assignments Scores Guidelines Archive
Exams
Animated Algorithms
Interactive Tests
Key Definitions
News
mailto:webmaster

# Homework 45: Digit classification with backprop

[Points: 12.5; Issued: 2007/03/30; Deadline: 2007/05/08; Tutor: Susanne Rexeis; Infohour: 2007/04/27, 15:15-16:15, HSi11; Einsichtnahme: 2007/05/25, 15:15-16:15, HSi11; Download: pdf; ps.gz]

## Neural Networks as Feature Generator [5 points]

Show that the hidden units of a network may find meaningful feature groupings in the following problem based on optical digit recognition.
• Let your input space consist of a 8x8 pixel grid. Generate a 100 training patterns for a category in the following way. Start with a block letter representation of , where black pixels have values 0 and the white pixels +1. Generate 100 different versions of this prototype by adding independent random noise to each pixel. Let the distribution of the noise be uniform between and . Repeat the above procedure for the digits 0 and by removing some black pixels from the original (without noise) version of . This gives you a dataset of 300 training patterns.
• Train a 64-2-3 network with logsig activation functions for this classification task. Use the training method traingdx1 with standard parameters for training, train the network for 500 epoches.
• Display the input to hidden weights as 8x8 images seperately for each hidden unit.
• Can you find any useful features in the weight patterns (features are in this case areas with the same weight value)? Interpret your results, in particular discuss why such a feature representation has been chosen by the hidden layer.

### Hints

• Before network training set the weight and bias values to small but nonzero values.
• To visualize the hidden layer weights you can use the commands
  i = 1;
hiddenW1 = reshape(net.IW{1}(neuron, :), 8, 8);
imagesc(hiddenW1); colormap(1-gray);

to produce an image of the weights of the i-th hidden neuron.

## Digit Classification with real world data [7.5 points]

Similarly to the Optical Character Recognition tutorial you are asked to train a feed forward network to perform a digit classification task. The difference to the tutorial is the data set: here you use the Digits data set. The file digits.mat (which you get when unzipping digits.zip) contains about 4000 images (8 pixel 8 pixel 16 colors) of training samples (learn.P) of hand drawn digits (0,1,...,9) and their classification (learn.T) and about 2000 images of test samples (test.P, test.T).

• Report the performance (i.e. percentage of test examples correctly classified) of 3 network architectures. Use a network without hidden units, with 4 hidden units and with 20 hidden units.
• Use the training training function traingdx2. Find good training parameters (learning rate and momentum term) such that an optimal test performance is achieved.
• Train the network without heuristics to avoid overfitting (don't use early stopping or weight decay).
• Discuss the relationship between the network architecture and the performance on the test set.
• What can one conclude about the complexity of the task if you consider the performance of the network without any hidden units?

### Hints

• Normalize the data using prestd and trastd.
• Before network training set the weight and bias values to small but nonzero values.
• To see how the digits look like you can use the commands
  i=5;
p=learn.P(:,i);
x=reshape(p,8,8)';
imagesc(x); colormap(1-gray);

to produce an image of the i-th training example.

### Remarks

• Present your results clearly, structured and legible. Document them in such a way that anybody can reproduce them effortless.
• Use the Matrikel Number of one of your team members to initialize the random number generator.
• Please hand in the print out of the Matlab program you have used.

#### Fußnoten

... traingdx1
Gradient descent with momentum and adaptive learning rate.
... traingdx2
Gradient descent with momentum and adaptive learning rate.