Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Animated Algorithms
Interactive Tests
Key Definitions
Literature and Links

Homework 1: Linear Regression and Gradient descent

[Points: 12.5; Issued: 2008/03/14; Deadline: 2008/05/05; Tutor: Sabine Sternig; Infohour: 2008/04/29, 13:00-14:00, Seminarraum Infeldgasse 16b, 1. Stock; Einsichtnahme: 2008/05/16, 15:30-16:30, HS i11; Download: pdf; ps.gz]

Polynomial Regression [6.5 points]

Consider a 10-degree polynomial model. Use an additive model with 11 basis functions $ \phi_k(x) = x^k$ ( $ k = 0\dots10$) and the following error function for your training examples
$\displaystyle E(\vec{w}) = \sum_{i = 1}^N (y_i - \sum_{k=0}^{10} \phi_k(x_i) w_k)^2 + \alpha \sum_{k = 0}^{10} w_k^2$
  • Write the error function in matrix form. Explicitely state the dimensions of the vectors and matrices.
  • Derivate a closed form solution for the optimal weight vector. (Hint : Use the identity $ \alpha \vec{w} = \alpha \vec{I} \vec{w}$, $ \vec{I}$ being the identity matrix.
  • Write a matlab script which implements your learning rule. Use the following data set, which contains the training data (input: x_train, output: y_train) and the data for testing (input: x_test, output: y_test).
  • Train your model with the trainings data using $ \alpha$ values of 0:0.01:10
  • Plot the mean squared error of the training and of the test set for the given $ \alpha$s.
  • Plot the learned functions for $ \alpha = 0$, and $ \alpha = 10.0$ the best $ \alpha$ for the error on the test set. Interpret your results.
  • Plot the mean absolute weight values for the given $ \alpha$ (use a semilogy plot for better illustration).
  • Interpret your results, what is the porpuse of $ \alpha$?


  • For a single training example $ x$, the basis functions can be easily created by

Gradient Descent [6 points]

Consider the following feedforward neural network with a $ 1$-dimensional input, $ K$ outputs an $ M$ hidden units, where the $ k$th output is given by :
$\displaystyle y_k(x) = \sum_{j = 1}^M w_{kj} f((x - \mu_j)^2), $
where $ f$ is an arbitrary function. Derive a gradient descent learning rule for the weights $ w_{kj}$ and $ \mu_j$ which minimizes the mean squared error (mse) of a single example $ (x, \vec{b})$.


  • Use $ f'$ as the derivation of function $ f$
  • The chain rule is your friend... use it!
  • Please state the whole weight update rule, the gradient alone is not sufficient.