
[Points: 12.5; Issued: 2008/05/16; Deadline: 2008/06/06; Tutor:
Michael Wiesenegger; Infohour: 2008/05/30,
15:3016:30, HS i11; Einsichtnahme: 2008/06/20, 15:3016:30, HS
i11; Download: pdf; ps.gz]
Go through the tutorial ``Gaussian Statistics and Unsupervised
Learning'' and download the accompanying MATLAB
programs and data. The tasks issued in this homework are a subset
of the tasks in the tutorial.
You can use a printout of this homework designation as a basis
for your elaboration, and fill in your results and answers. Add
printouts of all requested figures. Do not forget to add your name
and ``Matr.Nr.'' on top.
Please add the printout of your MATLAB scripts
used for producing the results to your report.
 Load data from file ``pendigit.mat''. Supposing that the whole
database covers adequately an imaginary language made up only of
/a/'s, /e/'s, /i/'s, /o/'s and /y/'s, compute the probability
of each class
/a/ /e/ /i/ /o/ /y/. Which is the most common and which is the least
common phoneme in the language?
 Most common phoneme:
 Least common phoneme:
 Plot the clouds of observed feature points
for all
vowels:
» plotvow
Train the Gaussian models corresponding to each class (use directly
the mean and cov commands). Then compute and plot
the Gaussian models:
» mu_a = mean(a);
» sigma_a = cov(a);
» plotgaus(mu_a,sigma_a,[0 1 1]);
Print a plot showing the pdfs of the Gaussian models for the five
phoneme classes of our imaginary language, label each Gaussian with
the according class and the axes, and append the plot to this
homework.
 Now, we have modeled each phoneme class with a Gaussian pdf (by
computing means and variances), we know the probability
of each class
in the imaginary language, and we assume that the speech feature
vectors
(as opposed to the phoneme classes) are equiprobable.
What is the most probable class for the speech feature points
given in the table below? Symbol denotes the last two digits of your Matrikelnummer. Use the Bayesian discriminant
function
,
proportional to the log of the posterior probability of
belonging to class
.
i 






Most prob. class 
1 







2 







3 







4 







5 







6 







We will now use two algorithms for unsupervised learning to
cluster the feature vectors
for all phonemes of the imaginary
language, i.e., we try to automatically find the distributions of
the feature vectors for the 5 phoneme classes.
Launch the means
explorer kmeans with the data sample allvow
(stored in file pendigit.mat) which gathers the feature
vectors for all phonemes of our imaginary language in one matrix.
Do several runs with each of the following different
initializations of the algorithm:
 5 initial clusters determined according to the default
heuristic;
 some initial MEANS values equal to some data
points;
 some initial MEANS values equal to
.
Iterate the algorithm until its convergence. Observe the evolution
of the cluster centers, of the datapoints attribution chart and of
the total squared Euclidean distance. (If you enable button ``Zoom
in'' in the ``Tools'' menu it is possible to zoom the plots: left
click inside the axes to zoom centered on the point under the mouse; right
click to zoom out; click and drag to zoom into an area; double
click to reset the figure to the original size.) Observe the mean
values found after the convergence of the algorithm.
» kmeans(allvow,5);
 or 
» means = {mu_a, mu_e, mu_i, mu_o, mu_y};
» kmeans(allvow,means);
Enlarge the window, then push the buttons, zoom etc. After the
convergence, use:
» for k=1:5, disp(kmeans_result_means{k}); end
to see the resulting means.
 Does the final solution of the means algorithm depend on the initialization
(Advice: Use e.g. 10 clusters)?
 Describe the evolution of the total squared Euclidean
distance.
 What is the nature of the discriminant surfaces corresponding
to a minimum Euclidean distance classification scheme?
 Is the algorithm suitable for fitting Gaussian clusters?
Why/why not?
Launch the EM explorer emalgo with the same dataset
allvow. Do several runs with different initializations of
the algorithm:
 5 clusters determined according to the default heuristic;
 some initial MEANS values equal to some data points,
and some random VARS values (e.g., cov(allvow)
for all the classes);
 the initial MEANS and VARS values found by
the means
algorithm.
 the initial MEANS values equal to
,
initial VARS values equal to
,
and initial PRIORS values equal to
/a//e//i//o//y/;
 some initial MEANS and VARS values chosen by
yourself.
Iterate the algorithm until the total likelihood reaches an
asymptotic convergence. Observe the evolution of the clusters and
of the total likelihood curve. (In the EM case, the data points
attribution chart is not given because all data points participate
in the update of each cluster.) Observe the mean, variance and
prior values found after the convergence of the algorithm. Compare
them with the real mean and covariance values of the vowels.
Describe your observations.
(If you have time, also increase the number of clusters and play
again with the algorithm.)
» emalgo(allvow,5);
 or 
» means = {mu_a,mu_e,mu_i,mu_o,mu_y};
» vars =
{sigma_a,sigma_e,sigma_i,sigma_o,sigma_y};
» emalgo(allvow,means,vars);
Enlarge the window, then push the buttons, zoom etc. After
convergence, use:
» for k=1:5, disp(emalgo_result_means{k}); end
» for k=1:5, disp(emalgo_result_vars{k}); end
» for k=1:5, disp(emalgo_result_priors(k));
end
to see the resulting means, variances and priors.
 Does the final solution of the EM algorithm depend on the
initialization of the algorithm?
 Describe the evolution of the total likelihood. Is it (always)
monotonic?
 In terms of optimization of the likelihood, what does the final
solution correspond to?
 Is the algorithm suitable for fitting Gaussian clusters?
Why/why not?
