Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
General
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Homework
Assignments
Scores
Guidelines
Archive
Exams
Animated Algorithms
Interactive Tests
Key Definitions
Downloads
Literature and Links
News
mailto:webmaster

Homework 6: Gaussian Statistics and Unsupervised Learning



[Points: 12.5; Issued: 2008/05/16; Deadline: 2008/06/06; Tutor: Michael Wiesenegger; Infohour: 2008/05/30, 15:30-16:30, HS i11; Einsichtnahme: 2008/06/20, 15:30-16:30, HS i11; Download: pdf; ps.gz]





Go through the tutorial ``Gaussian Statistics and Unsupervised Learning'' and download the accompanying MATLAB programs and data. The tasks issued in this homework are a subset of the tasks in the tutorial.

You can use a printout of this homework designation as a basis for your elaboration, and fill in your results and answers. Add printouts of all requested figures. Do not forget to add your name and ``Matr.Nr.'' on top.

Please add the printout of your MATLAB scripts used for producing the results to your report.

Statistical Pattern Recognition

  1. Load data from file ``pendigit.mat''. Supposing that the whole database covers adequately an imaginary language made up only of /a/'s, /e/'s, /i/'s, /o/'s and /y/'s, compute the probability $ P(q_k)$ of each class $ q_k, k \in \{$/a/$ ,$   /e/$ ,$   /i/$ ,$   /o/$ ,$   /y/$ \}$. Which is the most common and which is the least common phoneme in the language?

    $\displaystyle P(q_$/a/$\displaystyle )=\qquad\quad P(q_$/e/$\displaystyle )=\qquad\quad P(q_$/i/$\displaystyle )=\qquad\quad P(q_$/o/$\displaystyle )=\qquad\quad P(q_$/y/$\displaystyle )=\qquad\quad$



    Most common phoneme:
    Least common phoneme:
  2. Plot the clouds of observed feature points $ \ensuremath\mathbf{x}=[F_1,F_2]^{\mathsf T}$ for all vowels:

    » plotvow

    Train the Gaussian models corresponding to each class (use directly the mean and cov commands). Then compute and plot the Gaussian models:

    » mu_a = mean(a);

    » sigma_a = cov(a);

    » plotgaus(mu_a,sigma_a,[0 1 1]);



    Print a plot showing the pdfs of the Gaussian models for the five phoneme classes of our imaginary language, label each Gaussian with the according class and the axes, and append the plot to this homework.
  3. Now, we have modeled each phoneme class with a Gaussian pdf (by computing means and variances), we know the probability $ P(q_k)$ of each class in the imaginary language, and we assume that the speech feature vectors $ \ensuremath\mathbf{x}$ (as opposed to the phoneme classes) are equi-probable.

    What is the most probable class $ q_k$ for the speech feature points $ \ensuremath\mathbf{x}_i=[F_1,F_2]^{\mathsf T}+[A,A]^{\mathsf T}$ given in the table below? Symbol $ A$ denotes the last two digits of your Matrikelnummer. Use the Bayesian discriminant function $ f_k(\ensuremath\mathbf{x}_i) = \log p(\ensuremath\mathbf{x}_i\vert q_k,\ensuremath\boldsymbol{\Theta}) + \log P(q_k\vert\ensuremath\boldsymbol{\Theta})$, proportional to the log of the posterior probability of $ \ensuremath\mathbf{x}_i$ belonging to class $ q_k$.



i $ \ensuremath\mathbf{x}_i=[F_1,F_2]^{\mathsf T}$ $ f_{\text{/a/}}(\ensuremath\mathbf{x}_i)$ $ f_{\text{/e/}}(\ensuremath\mathbf{x}_i)$ $ f_{\text{/i/}}(\ensuremath\mathbf{x}_i)$ $ f_{\text{/o/}}(\ensuremath\mathbf{x}_i)$ $ f_{\text{/y/}}(\ensuremath\mathbf{x}_i)$ Most prob. class $ q_k$
1 $ [200+A,1500+A]^{\mathsf T}$            
2 $ [400+A,1000+A]^{\mathsf T}$            
3 $ [530+A,1000+A]^{\mathsf T}$            
4 $ [600+A,1000+A]^{\mathsf T}$            
5 $ [670+A,1000+A]^{\mathsf T}$            
6 $ [700+A,2000+A]^{\mathsf T}$            
  • The iso-likelihood lines for the Gaussian pdfs $ {\cal N}\left(\ensuremath\boldsymbol{\mu}_{\text{/i/}},\ensuremath\boldsymbol{\Sigma}_{\text{/i/}}\right)$ and $ {\cal N}\left(\ensuremath\boldsymbol{\mu}_{\text{/e/}},\ensuremath\boldsymbol{\Sigma}_{\text{/e/}}\right)$, which we used before to model the class /i/ and the class /e/, are plotted in Figure 1 of the tutorial. On a second graph, the iso-likelihood lines for $ {\cal N}\left(\ensuremath\boldsymbol{\mu}_{\text{/i/}},\ensuremath\boldsymbol{\Sigma}_{\text{/e/}}\right)$ and $ {\cal N}\left(\ensuremath\boldsymbol{\mu}_{\text{/e/}},\ensuremath\boldsymbol{\Sigma}_{\text{/e/}}\right)$ (two pdfs with the same covariance matrix $ \ensuremath\boldsymbol{\Sigma}_{\text{/e/}}$) are represented.

    On the figures, use a colored pen to join the intersections of the level lines that correspond to equal likelihoods. (If the parameters of your Gaussian models for /e/ and /i/ are on the workspace, you can also use isosurf in MATLAB to create a color plot.)

    • What is the nature of the surface that separates class /i/ from class /e/ when the two models have different covariance matrices? Can you explain the reason for this form?
    • What is the nature of the surface that separates class /i/ from class /e/ when the two models have the same covariance matrices? Why is it different from the previous discriminant surface?

Unsupervised Learning

We will now use two algorithms for unsupervised learning to cluster the feature vectors $ \ensuremath\mathbf{x}$ for all phonemes of the imaginary language, i.e., we try to automatically find the distributions of the feature vectors for the 5 phoneme classes.

Clustering using the $ K$-means algorithm

Launch the $ K$-means explorer kmeans with the data sample allvow (stored in file pendigit.mat) which gathers the feature vectors for all phonemes of our imaginary language in one matrix. Do several runs with each of the following different initializations of the algorithm:

  1. 5 initial clusters determined according to the default heuristic;
  2. some initial MEANS values equal to some data points;
  3. some initial MEANS values equal to $ \{\ensuremath\boldsymbol{\mu}_{\text{/a/}}, \ensuremath\boldsymbol{\mu}_{\text... ...remath\boldsymbol{\mu}_{\text{/o/}}, \ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$.
Iterate the algorithm until its convergence. Observe the evolution of the cluster centers, of the data-points attribution chart and of the total squared Euclidean distance. (If you enable button ``Zoom in'' in the ``Tools'' menu it is possible to zoom the plots: left click inside the axes to zoom $ 2\times$ centered on the point under the mouse; right click to zoom out; click and drag to zoom into an area; double click to reset the figure to the original size.) Observe the mean values found after the convergence of the algorithm.

Example:



» kmeans(allvow,5);

- or -

» means = {mu_a, mu_e, mu_i, mu_o, mu_y};

» kmeans(allvow,means);

Enlarge the window, then push the buttons, zoom etc. After the convergence, use:

» for k=1:5, disp(kmeans_result_means{k}); end

to see the resulting means.

Answer the following questions:

  1. Does the final solution of the $ K$-means algorithm depend on the initialization (Advice: Use e.g. 10 clusters)?
  2. Describe the evolution of the total squared Euclidean distance.
  3. What is the nature of the discriminant surfaces corresponding to a minimum Euclidean distance classification scheme?
  4. Is the algorithm suitable for fitting Gaussian clusters? Why/why not?

Clustering using the EM algorithm

Launch the EM explorer emalgo with the same dataset allvow. Do several runs with different initializations of the algorithm:

  1. 5 clusters determined according to the default heuristic;
  2. some initial MEANS values equal to some data points, and some random VARS values (e.g., cov(allvow) for all the classes);
  3. the initial MEANS and VARS values found by the $ k$-means algorithm.
  4. the initial MEANS values equal to $ \{\ensuremath\boldsymbol{\mu}_{\text{/a/}}, \ensuremath\boldsymbol{\mu}_{\text... ...remath\boldsymbol{\mu}_{\text{/o/}}, \ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$,

    initial VARS values equal to $ \{\ensuremath\boldsymbol{\Sigma}_{\text{/a/}}, \ensuremath\boldsymbol{\Sigma}_... ...\boldsymbol{\Sigma}_{\text{/o/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/y/}}\}$,

    and initial PRIORS values equal to $ [P(q_$/a/$ ), P(q_$/e/$ ), P(q_$/i/$ ), P(q_$/o/$ ), P(q_$/y/$ )]$;
  5. some initial MEANS and VARS values chosen by yourself.

Iterate the algorithm until the total likelihood reaches an asymptotic convergence. Observe the evolution of the clusters and of the total likelihood curve. (In the EM case, the data points attribution chart is not given because all data points participate in the update of each cluster.) Observe the mean, variance and prior values found after the convergence of the algorithm. Compare them with the real mean and covariance values of the vowels. Describe your observations.

(If you have time, also increase the number of clusters and play again with the algorithm.)

Example:



» emalgo(allvow,5);

- or -

» means = {mu_a,mu_e,mu_i,mu_o,mu_y};

» vars = {sigma_a,sigma_e,sigma_i,sigma_o,sigma_y};

» emalgo(allvow,means,vars);

Enlarge the window, then push the buttons, zoom etc. After convergence, use:

» for k=1:5, disp(emalgo_result_means{k}); end

» for k=1:5, disp(emalgo_result_vars{k}); end

» for k=1:5, disp(emalgo_result_priors(k)); end

to see the resulting means, variances and priors.

Questions:

  1. Does the final solution of the EM algorithm depend on the initialization of the algorithm?
  2. Describe the evolution of the total likelihood. Is it (always) monotonic?
  3. In terms of optimization of the likelihood, what does the final solution correspond to?
  4. Is the algorithm suitable for fitting Gaussian clusters? Why/why not?