Computational Intelligence, SS08
2 VO 442.070 + 1 RU 708.070 last updated:
Course Notes (Skriptum)
Online Tutorials
Introduction to Matlab
Neural Network Toolbox
OCR with ANNs
Adaptive Filters
VC dimension
Gaussian Statistics
PCA, ICA, Blind Source Separation
Hidden Markov Models
Mixtures of Gaussians
Automatic Speech Recognition
Practical Course Slides
Animated Algorithms
Interactive Tests
Key Definitions
Literature and Links


The LMS Adaptation Algorithm

The LMS (least mean squares) algorithm is an approximation of the steepest descent algorithm which uses an instantaneous estimate of the gradient vector of a cost function. The estimate of the gradient is based on sample values of the tap-input vector and an error signal. The algorithm iterates over each coefficient in the filter, moving it in the direction of the approximated gradient [1].

For the LMS algorithm it is necessary to have a reference signal $ d[n]$ representing the desired filter output. The difference between the reference signal and the actual output of the transversal filter (eq. 2) is the error signal

$\displaystyle e[n] = d[n] - \mathbf{c}^{\mathsf H}[n]\mathbf{x}[n].$ (3)

A schematic of the learning setup is depicted in fig. 2.
Figure 2: Adaptive transversal filter learning

The task of the LMS algorithm is to find a set of filter coefficients $ \mathbf{c}$ that minimize the expected value of the quadratic error signal, i.e., to achieve the least mean squared error (thus the name). The squared error and its expected value are (for simplicity of notation and perception we drop the dependence of all variables on time $ n$ in eqs. 4 to 7)

$\displaystyle e^2 = \left(d - \mathbf{c}^{\mathsf H}\mathbf{x}\right)^2 = d^2 -... ...\mathbf{x}+ \mathbf{c}^{\mathsf H}\mathbf{x}\,\mathbf{x}^{\mathsf H}\mathbf{c},$ (4)

$\displaystyle {\mathrm{E}}(e^2)$ $\displaystyle =$ $\displaystyle {\mathrm{E}}(d^2) - {\mathrm{E}}(2d\,\mathbf{c}^{\mathsf H}\mathb... ...\mathrm{E}}(\mathbf{c}^{\mathsf H}\mathbf{x}\,\mathbf{x}^{\mathsf H}\mathbf{c})$  
  $\displaystyle =$ $\displaystyle {\mathrm{E}}(d^2) - \mathbf{c}^{\mathsf H}\,2{\mathrm{E}}(d\mathb...{c}^{\mathsf H}\,{\mathrm{E}}(\mathbf{x}\,\mathbf{x}^{\mathsf H})\,\mathbf{c}$ (5)

Note, that the squared error $ e^2$ is a quadratic function of the coefficient vector $ \mathbf{c}$, and thus has only one (global) minimum (and no other (local) minima), that theoretically could be found if the correct expected values in eq. 5 were known.

The gradient descent approach demands that the position on the error surface according to the current coefficients should be moved into the direction of the `steepest descent', i.e., in the direction of the negative gradient of the cost function $ J={\mathrm{E}}(e^2)$ with respect to the coefficient vector2

$\displaystyle -\nabla_{\mathbf{c}} J = 2 {\mathrm{E}}(d\,\mathbf{x}) - 2 {\mathrm{E}}(\mathbf{x}\,\mathbf{x}^{\mathsf H}) \mathbf{c}.$ (6)

The expected values in this equation, $ {\mathrm{E}}(d\,\mathbf{x})=\mathbf{p}$, the cross-correlation vector between the desired output signal and the tap-input vector, and $ {\mathrm{E}}(\mathbf{x}\,\mathbf{x}^{\mathsf H})=\mathbf{R}$, the auto-correlation matrix of the tap-input vector, would usually be estimated using a large number of samples from $ d$ and $ \mathbf{x}$. In the LMS algorithm, however, a very short-term estimate is used by only taking into account the current samples: $ {\mathrm{E}}(d\,\mathbf{x})\approx d\,\mathbf{x}$, and $ {\mathrm{E}}(\mathbf{x}\,\mathbf{x}^{\mathsf H})\approx \mathbf{x}\,\mathbf{x}^{\mathsf H}$, leading to an update equation for the filter coefficients

$\displaystyle \mathbf{c}^{\text{new}}$ $\displaystyle =$ $\displaystyle \mathbf{c}^{\text{old}} + \mu/2\ \left(-\nabla_{\mathbf{c}} J(\mathbf{c})\right)$  
  $\displaystyle =$ $\displaystyle \mathbf{c}^{\text{old}} + \mu\; \mathbf{x}\; (d - \mathbf{x}^{\mathsf H}\mathbf{c})$  
  $\displaystyle =$ $\displaystyle \mathbf{c}^{\text{old}} + \mu\; \mathbf{x}\; e^{\ast}.$ (7)

Here, we introduced the `step-size' parameter $ \mu$, which controls the distance we move along the error surface. In the LMS algorithm the update of the coefficients, eq. 7, is performed at every time instant $ n$,
$\displaystyle \mathbf{c}[n+1] = \mathbf{c}[n]+ \mu \, e^{\ast}[n] \, \mathbf{x}[n].$ (8)

Choice of step-size

The `step-size' parameter $ \mu$ introduced in eq. 7 and 8 controls how far we move along the error function surface at each update step. $ \mu$ certainly has to be chosen $ \mu>0$ (otherwise we would move the coefficient vector in a direction towards larger squared error). Also, $ \mu$ should not be too large, since in the LMS algorithm we use a local approximation of $ \mathbf{p}$ and $ \mathbf{R}$ in the computation of the gradient of the cost function, and thus the cost function at each time instant may differ from an accurate global cost function.

Furthermore, too large a step-size causes the LMS algorithm to be instable, i.e., the coefficients do not converge to fixed values but oscillate. Closer analysis [1] reveals, that the upper bound for $ \mu$ for stable behavior of the LMS algorithm depends on the largest eigenvalue $ \lambda_$max of the tap-input auto-correlation matrix $ \mathbf{R}$ and thus on the input signal. For stable adaptation behavior the step-size has to be

$\displaystyle \mu < \frac{2}{\lambda_\text{max}}.$ (9)

Since we still do not want to compute an estimate of $ \mathbf{R}$ and its eigenvalues, we first approximate $ \lambda_$max$ \approx {\mathrm{tr}}(\mathbf{R})$ ( $ {\mathrm{tr}}(\mathbf{R})$ is the trace of matrix $ \mathbf{R}$, i.e., the sum of the elements on its diagonal), and then - in the same way as we approximated the expected values in the cost function - $ {\mathrm{tr}}(\mathbf{R})\approx \Vert\mathbf{x}[n]\Vert^2$, the tap-input power at the current time $ n$. Hence, the upper bound for $ \mu$ for stable behavior depends on the signal power3.

Summary of the LMS algorithm

  1. Filter operation:
    $\displaystyle y[n] = \mathbf{c}^{\mathsf H}[n] \mathbf{x}[n] $
  2. Error calculation:
    $\displaystyle e[n] = d[n] - y[n] $
    where $ d[n]$ is the desired output
  3. Coefficient adaptation:
    $\displaystyle \mathbf{c}[n+1] = \mathbf{c}[n]+ \mu \, e^{\ast}[n] \, \mathbf{x}[n] $