Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
General
Course Notes (Skriptum)
Online Tutorials
Practical Course Slides
Homework
Assignments
Scores
Guidelines
Archive
Exams
Animated Algorithms
Interactive Tests
Key Definitions
Downloads
Literature and Links
News
mailto:webmaster

Homework 9: Gaussian Statistics and Unsupervised Learning



[Points: 10; Issued: 2003/05/09; Deadline: 2003/05/23; Tutor: Alen Idrizovic; Infohour: 2003/05/21, 14:00-15:00, Seminarraum IGI; Einsichtnahme: 2003/06/11, 14:00-15:00, Seminarraum IGI; Download: pdf; ps.gz]





  1. Statistical Pattern Recognition
    • Load data from file ``vowels.mat''. Supposing that the whole database covers adequately an imaginary language made only of /a/'s, /e/'s, /i/'s, /o/'s and /y/'s, compute the probability $ P(q_k)$ of each class $ q_k$, $ k \in \{/a/,/e/,/i/,/o/,/y/\}$. What are the most common and the least common phoneme in the language?

    • Train the Gaussian models corresponding to each class (use directly the mean and cov commands). Then compute and plot the Gaussian models:

      » mu_a = mean(a);

      » sigma_a = cov(a);

      » plotgaus(mu_a,sigma_a,[0 1 1]);

    • Now, we have modeled each vowel class with a Gaussian pdf (by computing means and variances), we know the probability $ P(q_k)$ of each class in the imaginary language, and we assume that the speech features (as opposed to speech classes) are equi-probable. What is the most probable class $ q_k$ for the speech feature points $ x=(F_1,F_2)^T$ given in the table below ?



      x $ F_1$ $ F_2$ $ \log P(q_{/a/}\vert x)$ $ \log P(q_{/e/}\vert x)$ $ \log P(q_{/i/}\vert x)$ $ \log P(q_{/o/}\vert x)$ $ \log P(q_{/y/}\vert x)$ Most prob.

      class
      1. 400 1800
      2. 400 1000
      3. 530 1000
      4. 600 1300
      5. 670 1300
      6. 420 2500
    • The iso-likelihood lines for the Gaussian pdfs $ {\cal N}\left(\mu_{/i/},\Sigma_{/i/}\right)$ and $ {\cal N}\left(\mu_{/e/},\Sigma_{/e/}\right)$, which we used before to model the class /i/ and the class /e/, are plotted in Figure 1 of the tutorial. On a second graph, the iso-likelihood lines for $ {\cal N}\left(\mu_{/i/},\Sigma_{/e/}\right)$ and $ {\cal N}\left(\mu_{/e/},\Sigma_{/e/}\right)$ (two pdfs with the same covariance matrix $ \Sigma_{/e/}$) are represented. On these figures, use a colored pen to join the intersections of the level lines that correspond to equal likelihoods. (You can also use isosurf in MATLAB to create a color plot.)
    • What is the nature of the surface that separates class /i/ from class /e/ when the two models have different covariance matrices ? Can you explain the origin of this form ?

      What is the nature of the surface that separates class /i/ from class /e/ when the two models have the same covariance matrices? Why is it different from the previous discriminant surface ?

  2. Unsupervised Learning
    • Use the K-means explorer utility:
       KMEANS K-means algorithm exploration tool
      
         Launch it with KMEANS(DATA,NCLUST) where DATA is the matrix
         of observations (one observation per row) and NCLUST is the
         desired number of clusters.
      
         The clusters are initialized with a heuristic that spreads
         them randomly around mean(DATA) with standard deviation
         sqrtm(cov(DATA)).
      
         If you want to set your own initial clusters, use
         KMEANS(DATA,MEANS) where MEANS is a cell array containing
         NCLUST initial mean vectors.
      
         Example: for two clusters
           means{1} = [1 2]; means{2} = [3 4];
           kmeans(data,means);
      
      Launch it with the data sample allvow, which was part of file vowels.mat and gathers all the simulated vowels data. Do several runs with different cases of initialization of the algorithm:
      1. 5 initial clusters determined according to the default heuristic;
      2. some initial MEANS values equal to some data points;
      3. some initial MEANS values equal to $ \{\mu_{/a/}, \mu_{/e/}, \mu_{/i/}, \mu_{/o/}, \mu_{/y/}\}$.
      Iterate the algorithm until its convergence. Observe the evolution of the cluster centers, of the data-points attribution chart and of the total squared Euclidean distance. (It is possible to zoom these plots: left click inside the axes to zoom $ 2\times$ centered on the point under the mouse; right click to zoom out; click and drag to zoom into an area; double click to reset the figure to the original). Observe the mean values found after the convergence of the algorithm.

Example:

» kmeans(allvow,5);

- or -

» means = { mu_a, mu_e, mu_i, mu_o, mu_y };

» kmeans(allvow,means);

Enlarge the window, then push the buttons, zoom etc. After the convergence, use:

» for k=1:5, disp(kmeans_result_means{k}); end

to see the resulting means.

Think about the following question:

  1. Does the final solution depend on the initialization of the algorithm ?
  2. Describe the evolution of the total squared Euclidean distance.
  3. Is the algorithm suitable for fitting Gaussian clusters ?
Use the EM explorer utility:
 EMALGO EM algorithm explorer

   Launch it with EMALGO(DATA,NCLUST) where DATA is the matrix
   of observations (one observation per row) and NCLUST is the
   desired number of clusters.

   The clusters are initialized with a heuristic that spreads
   them randomly around mean(DATA) with standard deviation
   sqrtm(cov(DATA)*10). Their initial covariance is set to cov(DATA).

   If you want to set your own initial clusters, use
   EMALGO(DATA,MEANS,VARS) where MEANS and VARS are cell arrays
   containing respectively NCLUST initial mean vectors and NCLUST
   initial covariance matrices. In this case, the initial a-priori
   probabilities are set equal to 1/NCLUST.

   To set your own initial priors, use VITERB(DATA,MEANS,VARS,PRIORS)
   where PRIORS is a vector containing NCLUST a priori probabilities.

   Example: for two clusters
     means{1} = [1 2]; means{2} = [3 4];
     vars{1} = [2 0;0 2]; vars{2} = [1 0;0 1];
     emalgo(data,means,vars);
Launch it with again the same dataset allvow. Do several runs with different cases of initialization of the algorithm:
  1. 5 clusters determined according to the default heuristic;
  2. some initial MEANS values equal to $ \{\mu_{/a/}, \mu_{/e/}, \mu_{/i/}, \mu_{/o/}, \mu_{/y/}\}$, VARS values equal to

    $ \{\Sigma_{/a/}, \Sigma_{/e/}, \Sigma_{/i/}, \Sigma_{/o/}, \Sigma_{/y/}\}$;
  3. some initial MEANS and VARS values chosen by yourself.
(If you have time, also increase the number of clusters and play again with the algorithm.)

Iterate the algorithm until the total likelihood reaches an asymptotic convergence. Observe the evolution of the clusters and of the total likelihood curve. (In the EM case, the data points attribution chart is not given because each data point participates to the update of each cluster.) Observe the mean, variance and prior values found after the convergence of the algorithm. Compare them with the real mean and covariance values of the vowels. Describe your observations.

Example:

» emalgo(allvow,5);

- or -

» means = { mu_a, mu_e, mu_i, mu_o, mu_y };

» vars = { sigma_a, sigma_e, sigma_i, sigma_o, sigma_y };

» emalgo(allvow,means,vars);

Enlarge the window, then push the buttons, zoom etc. After convergence, use:

» for k=1:5, disp(emalgo_result_means{k}); end

» for k=1:5, disp(emalgo_result_vars{k}); end

» for k=1:5, disp(emalgo_result_priors(k)); end

to see the resulting means, variances and priors.

Questions:

  1. Does the final solution depend on the initialization of the algorithm ?
  2. Describe the evolution of the total likelihood. Is it monotonic ?
  3. Is the algorithm suitable for fitting Gaussian clusters ?