Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
Course Notes (Skriptum)
Online Tutorials
Introduction to Matlab
Neural Network Toolbox
OCR with ANNs
Adaptive Filters
VC dimension
Gaussian Statistics
PCA, ICA, Blind Source Separation
Hidden Markov Models
Mixtures of Gaussians
Automatic Speech Recognition
Practical Course Slides
Animated Algorithms
Interactive Tests
Key Definitions
Literature and Links


A Classification Task

Figure 1: Data set X projected to two dimensions.

As example our task is to create and train a perceptron that correctly classifies points sets belonging to three different classes. First we load the data from the file winedata.mat

>> load winedata X C
Each row of X represents a sample point whose class is specified by the corresponding element (row) in C. Further the data is transformed into the input/output format used by the Neural Network Toolbox
>> P=X';
where P(:,i) is the ith point. Since we want to classify three different classes we use 3 perceptrons, each for the classification of one class. The corresponding target function is generated by
>> T=ind2vec(C);
To create the perceptron layer with correct input range type
>> net=newp(minmax(P),size(T,1));

The difference between train and adapt

Both functions, train and adapt, are used for training a neural network, and most of the time both can be used for the same network. The most important difference has to do with incremental training (updating the weights after the presentation of each single training sample) versus batch training (updating the weights after each presenting the complete data set).


First, set net.adaptFcn to the desired adaptation function. We'll use adaptwb (from 'adapt weights and biases'), which allows for a separate update algorithm for each layer. Again, check the Matlab documentation for a complete overview of possible update algorithms.

>> net.adaptFcn = 'trains';
Next, since we're using trains, we'll have to set the learning function for all weights and biases:
>> net.inputWeights{1,1}.learnFcn = 'learnp';
>> net.biases{1}.learnFcn = 'learnp';
where learnp is the Perceptron learning rule. Finally, a useful parameter is net.adaptParam.passes, which is the maximum number of times the complete training set may be used for updating the network:
>> net.adaptParam.passes = 1;
When using adapt, both incremental and batch training can be used. Which one is actually used depends on the format of your training set. If it consists of two matrices of input and target vectors, like
>> [net,y,e] = adapt(net,P,T);
the network will be updated using batch training. Note that all elements of the matrix y are one, because the weights are not updated until all of the trainings set had been presented.

If the training set is given in the form of a cell array
>> for i = 1:length(P), P2{i} = P(:,i); T2{i}= T(:,i); end
>> net = init(net);
>> [net,y2,e2] = adapt(net,P2,T2);
then incremental training will be used. Notice that the weights had to be initialized before the network adaption was started. Since adapt takes a lot more time then train we continue our analysis with second algorithm.


When using train on the other hand, only batch training will be used, regardless of the format of the data (you can use both). The advantage of train is that it provides a lot more choice in training functions (gradient descent, gradient descent w/ momentum, Levenberg-Marquardt, etc.) which are implemented very efficiently. So for static networks (no tapped delay lines) usually train is the better choice.

We set

>> net.trainFcn = 'trainb';
for batch learning and
>> net.trainFcn = 'trainc';
for on-line learning. Which training parameters are present depends in general on your choice for the training function. In our case two useful parameters are net.trainParam.epochs, which is the maximum number of times the complete data set may be used for training, and, which is the time between status reports of the training function. For example,
>> net.trainParam.epochs = 1000;
>> = 100;
We initialize and simulate the network with
>> net = init(net);
>> [net,tr] = train(net,P,T);
The trainings error is calculated with
>> Y=sim(net,P);
>> train_error=mae(Y-T)

train_error =
So we see that the three classes of the data set were not linear seperable. The best time to stop learning would have been
>> [min_perf,min_epoch]=min(tr.perf)

min_perf =

min_epoch =
Figure 2: Performance of the learning algorithm train over 1000 epochs.