Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Animated Algorithms
Interactive Tests
Key Definitions
Literature and Links

Homework 44: Neural Networks and Gradient descent

[Points: 12.5; Issued: 2007/03/30; Deadline: 2007/05/08; Tutor: Sabine Sternig; Infohour: 2007/04/27, 15:15-16:15, HS i11; Einsichtnahme: 2007/05/25, 15:15-16:15, HS i11; Download: pdf; ps.gz]

Linear Networks [4 points]

Suppose you had a neural network with only linear activation function, one hidden and one output layer. Assume the $ W_I$ is the weight matrix from the input to the hidden layer and $ W_H$ is the weight vector from the hidden to the output layer, both layers do not use any bias terms. Write down the equation of the output value as function of the input vector $ \vec{x} = (x_1, x_2, ... x_n)$ and the weights of the network without explicitly mention the output of the hidden layer. Proof or disproof that this 2 layered neural network can be represented by a single linear unit. What can be said about with an arbitrary number of hidden layers with linear activation functions?

Backpropagation [8.5 points]

Consider a 2 layer feedforward neural network with 2 hidden units, 2 inputs and 1 linear output unit. The activation function of the hidden units is given by:
$\displaystyle A(x) = A(\sum_{i = 1}^2 w_i * (x_i + x_i ^ 2)), $
where $ A$ is an arbitrary sigmoidal activation function. Derive a gradient descent learning rule for the weights from the input to the hidden layer and the weights from the hidden to the output layer which minimizes the mean squared error (mse) of a single example $ (\vec{x}, b)$.


  • You do not need to write the formula of the sigmoidal function explicitly. Just use the term $ A(x)$ for the function and $ A'(x)$ for its derivative.