Next: Properties of Kernels [6 Up: MLA_Exercises_2007 Previous: EM Algorithm for Mixtures

Clustering [5 P]

In this homework example you should cluster images derived from the Olivetti face database with the cluster algorithms 'affinity propagation', 'k-means' and 'k-medoids'. You can download the dataset and its description from the course homepage3.

• Download the MATLAB software for affinity propagation from the webpage of Brendan J. Frey and Delbert Dueck4.

• Normalize the pixels in each 50 by 50 image (each column of the design matrix) to have mean 0 and variance 0.1.

• Calculate a similarity matrix , where is the negative squared Euclidean distance between data point and data point with .

• Apply the affinity propagation algorithm to the datatset. Vary the scalar parameter that indicates the preference that a data point is chosen as an exemplar (assuming that the preferences of all data points are equal to ). Create a plot showing the dependence of the average squared error (squared Euclidean distance between a cluster center and a point within the cluster) on the number of clusters (similar to Fig. 2b of the paper Brendan J. Frey and Delbert Dueck, Clustering by Passing Messages Between Data Points, Science (2007), 315:972). State the value for for which you obtained 10 clusters.

• Apply k-means and k-medoids to the dataset. For k-means use the MATLAB function kmeans. For k-medoids you have to write your own MATLAB function. For both algorithms use the squared Euclidean distance. Analyze similar to point 3 the dependence of the average squared error on the number of clusters.

• Analyze for the cluster centers and outliers (points with the largest squared errors) for all three algorithms. Explain the differences in the results.

Next: Properties of Kernels [6 Up: MLA_Exercises_2007 Previous: EM Algorithm for Mixtures
Haeusler Stefan 2007-12-03