Université de Neuchâtel,   Master in Statistics, 2009 – 2010,  Multivariate Analysis  (Ludovic Lebart) (Chapter 1)
Updated version of "Multivariate Descriptive Statistical Analysis", by L. Lebart, A. Morineau, K. Warwick, John Wiley, New York, 1984.

 

General Analysis  and  Singular Value Decomposition

 

 

(Compacting a rectangular array  X =  (xij) (i  n,  j ≤ n) )

 

In the following chapters, we present a variety of related methods for analyzing a data matrix, which can be loosely described under the heading of “principal axes methods”. In this introductory chapter we develop the theoretical basis common to all of these methods. This stems from the work of Eckart and Young (1936) on the singular value decomposition of a matrix. We emphasize the geometrical aspects of this technique.

 

Problem: Assuming that the (n,p) matrix  X is not a pure random matrix, is it then possible to summarize the "information"  underlying X with less than  np parameters ?

 

 

 

A Reminder : the history of SVD :  1904 :  a breakthrough .

 

Paper of Charles Spearman (1904) “General intelligence, objectively determined and measured”.

  

 

 

 

 

 

 

 

 

 


In 1904, Charles Spearman proposed to “explain” the mark  x obtained by individual i (i <= n) to the topic j (j <=  p) by a simple formula involving a unique individual value f and a series of p coefficients related only to the involved topics.  Thus, thanks to a residual, np values are approximated by  n + p  values. Although it was not recognised as such at that time, it could be the first attempt to perform a “structural compression” of a data table.

 

 

1)    - Fitting the data-points in  Rp:

 

The vector space  Rp   is a p-dimensional vector space.

 

Let u be a unit vector of   Rp  :        u'u = 1

The vector  y whose components are the n projections of the n "row-points" of  X  on the one-dimensional subspace defined by  u is the vector:

y  = Xu

 

 

Figure 1  Fitting the cloud of points

 

The sum of squares of these projections,   y'y, reads :

 

y'y =  (Xu)' Xu  =  u' X'X u

 

To determine u , we must therefore find the maximum of the quadratic form

 {u' X'X u}

 under the constraint

u'u = 1

 

Let  u1 be the solution.

The best two-dimensional subspace is bound to contain  u1 (exercise...)

Let us find   u, such that     u' u1 = 0   and     u'u = 1

which maximizes the quadratic form       {u' X'X u}

 Let u2   designates the vector  u  satisfying these conditions.

This proof is easily extended to the best q-dimensional subspace, which turns out to be spanned by

u1,   u2,    ....  uq-1,       and     u    such that

   u'ua = 0        ( a = 1, 2, 3, ... q-1)    and    u'u = 1

such that   {u' X'X u} is maximized.

This is a classical mathematical problem :

ua  is the  a -th eigenvector of the symmetric matrix X'X corresponding to the eigenvalue la.

 

 

Proofs

Problem 1

Maximizing {u' X'X u} with u'u = 1

 

Let us compute the vector of the partial derivatives of the Lagrangian  f(u) with respect to u :

   f(u) = u' X'X u   -  l ( u'u - 1)

We must have :

 2 X'X u   - 2 l  u    =   0

or :                              X'X u  =   l  u  

    with   l   = u' X'X u    since   u'u = 1

Therefore the maximum is necessarily an eigenvalue of   X'X , and the absolute maximum  u1 corresponds to the largest eigenvalue l 1 of this matrix.

 

Problem 2

Maximizing {u' X'X u} 

with       u'u = 1         and        u'u1 = 0

 

Let us compute the derivative of the quantity

f(u) = u' X'X u   -  l  ( u'u - 1)  m ( u'u1)

and let us write again :

We get :

 2 X'X u   - 2 l  u   - 2 m u1 =   0

Premultiplying the two elements of this equation by u'1     implies that      m  =  0,

this leaves    X'X u  =   l  u  

 with  always     l   = u' X'X u    since   u'u = 1

l  will be the second eigenvalue  l2 of the  matrix  X'X , and   u = u2  the  corresponding eigenvector .

 

More generally  (exercise):

The best q-dimensional subspace is generated by the first q eigenvectors of the matrix :

 X'X  

 corresponding to the q largest eigenvalues

l 1, l 2, ..... l q

 

In the space  Rp  up to now dealt with, the considered vectors were the n rows of the matrix X.

If we consider now the space  Rn , generated by the p columns of the matrix X, shall we obtain similar results ?

Which relationship exists between the fits performed in these two spaces ?

 

2)     - Best fit in the vector space Rn:

 

Let v be a unit vector of    Rn  :        v'v = 1

The vector  z of the p projections  of the p column-points of X  (row-points of the transposed matrix X')  onto the axis  v is the vector:

z  = X'v

The sum of squares of the projections,   z'z, is written :

z'z =  (X'v)' X'v  =  v' XX' v

The condition :

Maximizing  { v' XX' v }  with     v'v = 1

defines again  the best one-dimensional subspace, (in a least squares context), in the space Rn.

In analogous fashion, it can be shown that the best q-dimensional subspace in a least squares sense is then generated by

v1,   v2,    ....  vq-1,       and     v    such that

  v'va = 0  ( a = 1, 2, 3, ... q-1)     and       v'v = 1

and such that the quantity  {v' XX' v} is maximum.

Therefore :

va is the a -th eigenvector of the symmetric matrix XX' corresponding to  the eigenvalue l'a.

 

3)     - Relationships between the fits in Rn and in Rp:

 

In        Rn :               X X' v  =   l' v

Premultiplying the two sides by the matrix X' :

                      X' X (X' v)  =   l' (X' v)

But we have already obtained :

                         X' X u  =   l (with l  maximum)

 u   is therefore proportional to the vector    X' v   .

Since  l is  maximum:            l' ≤   l  

Similarly, starting from the last equation, we conclude that the vector   v   must also be proportional to the vector   X u , which leads to the new inequality :

                        l  ≤   l'

In the end :   

                                                   l  =   l' ,

and for every axis  a :                la = l'a

 

We have proved thus the proportionality of :

                              v  and Xu, on the one hand,    and

                              u  and  X' v   on the other.

va   =  k(a)  X ua

ua  = k'(a)  X' va

The proportionality coefficients k(a) and k'(a) can be computed using the fact that  ua and va   are unit vectors, and using also the following relationships :

            la = ua' X'X ua    =    va' XX' va

This leads immediately to the equations :

which induces the so-called transition formulae :

 

 

The original data array X and its transposed serve as transformation matrices linking the axes of best fit in the two vector spaces Rp  and  Rn .

The coordinates of the projections on these  axes in the two spaces Rp  and  Rn   are respectively the rows of the vectors

 X ua  and   X' va

The relationships

 

 

pinpoint the fact that the  coordinates in one space are proportional  to the component of the axis in the other space.

 

4)  Recreating the Original Data

 

We always call ua  the ath eigenvector of the matrix X'X, which has norm 1, and which corresponds to the eigenvalue la; and we call va   the a-th eigenvector of norm 1 of XX'.

 

 

We postmultiply the two elements of this equation by ua , and sum over all of the axes (some of which may correspond to a zero eigenvalue; they complement the orthonormal basis formed by the preceding axes):

 

 

Let us call U the matrix of order (p, p) whose columns are the eigenvectors ua of X'X. Since these vectors are orthogonal and of norm 1, we have U'U = I (identity matrix), and therefore UU' = I.

However,

 

Reminding that the eigenvalues la are always ranked in descending order, the above formula becomes

 

This formula may be considered a formula for recreating matrix X on the basis of the eigenvalues and their associated eigenvectors.

 


 

Figure 2.   Reconstitution of X: Singular values decomposition.

 

This formula  represents the  singular value decomposition of X (cf. Sylvester, 1889, for square matrices, and Eckart and Young, 1936,1939, for rectangular matrices).



 


 

If the p - q smallest eigenvalues are " very small," the summation can be limited to the first q terms:

If q is much smaller than p, it is easy to see the advantage by comparing the two elements of this equation. The np terms of X are therefore approximated by terms constructed from the q( n + p) values contained on the right side of the equation.

 

The recovery of the original data can be evaluated by the quantity

 

Again, we have


(tr is the trace). By substituting X and X* with their values in the previous equations, we obtain:

The quantity tq, which is less than or equal to 1, is called the percentage of variance relative to the q first factors. Its interpretation as a measure of the numerical quality of the recovery is clear, but we see later that the problem of its statistical significance is a sensitive issue.

 

 

5) Generalization : Metric M and Criterion of fit N

 

Fitting the data-points in  Rp:

 

M is a (p, p) symmetric definite positive matrix allowing for the definition of a scalar product (i.e.: a metric) in the space RpN is a diagonal matrix of positive weight.

Let u be a unit vector of     Rp    Hence, u is such that:    u'Mu = 1.

The vector  y whose p components are the n “M-projections” of the n "row-points" of  X  on the one-dimensional subspace defined by  u is the vector:

y  = XMu

 

The weighted sum of squares of these projections,   y'Ny, reads :

y'Ny =  (XMu)' NXMu  =  u' MX'NXMu

 

To determine u , we must therefore find the maximum of the quadratic form

 {u' MX'NXMu}

 under the constraint

u'Mu = 1

This is almost the same classical mathematical problem : ua is the  a-th eigenvector of the M-symmetric matrix X'NXM corresponding to the eigenvalue la.

 

Proof

Maximizing {u' MX'NXM u} with   u'Mu = 1

 

Let us compute the vector of the partial derivatives of the Lagrangian f(u) with respect to u :

f(u) = u' MX'NXM u  -  l ( u'Mu - 1)

We must have :       

2 MX'NXM u   - 2 l Mu    =   0

or :                                          X'NX Mu  =   l u  

or, putting: f =    Mu  :        MX'NX f  =   l f

 f =    Mu  is called the "factor",  projection operator onto the axis u.

 

f belongs to the dual vector space of Rp, that contains u.

 

If  M = I, the two spaces ( Rp,and its dual space) coincide.

 

f is a unit vector for the dual metric M-1

 

Similarly, l  is the quantity to be maximized :   l   = u' MX'NXMu    since   u'Mu = 1




Figure 3. Grayscale photograph ( 200x300 pixels)

 

   143   144   133   130   143   153   159   175   192   201   188   162   135   116   101   106   118  

  123   112   116   130   143   147   162   183   166   135   123   120   116   116   129   140   159  

  133   151   162   166   170   188   166   128   116   132   140   126   143   151   144   155   176  

  160   168   166   159   135   101    93    98   120   128   126   147   154   158   176   181   181  

  154   155   153   144   126   106   118   133   136   153   159   153   162   162   154   143   128  

  159   153   147   159   150   154   155   153   158   170   159   147   130   136   140   150   150  

  151   144   147   176   188   170   166   183   170   166   153   130   132   154   162   120   135  

  155   181   183   162   144   147   147   144   126   120   123   129   130   112   101   135   150  

  126   120   143   145   162   153   155   175   154   144   136   130   120   112   123   123   144  

  144   159   155   155   162   166   158   147   140   147   126   123   132   135   136   144   147  

  136   143   162   175   136   110   112   135   120   118   126   151   150   130   129   133   147  

  133   151   143   106    85    93   128   136   140   140   144   143   126   117   116   129   124  

  ……………………………..etc.

 

 

          List of the eigenvalues      (variant of SVD – Correspondence Analysis)

 

eigenvalues

    1:        0.045    28.549     28.549  ***********************************/../**

    2:        0.028    17.695     46.243  ******************************

    3:        0.019    12.205     58.448  *********************

    4:        0.012     7.306     65.754  ************

    5:        0.007     4.674     70.428  ********

    6:        0.006     3.516     73.944  ******

    7:        0.005     2.944     76.888  *****

    8:        0.003     2.179     79.067  ***

    9:        0.003     1.869     80.936  ***

   10:        0.002     1.531     82.467  **

   11:        0.002     1.371     83.838  **

   12:        0.002     1.106     84.944  *

   13:        0.002     1.066     86.010  *

   14:        0.002     0.956     86.965  *

   15:        0.001     0.791     87.756  *

   16:        0.001     0.758     88.514  *

   17:        0.001     0.690     89.204  *

   18:        0.001     0.567     89.771 

   19:        0.001     0.554     90.325 

   20:        0.001     0.477     90.801 

   21:        0.001     0.422     91.223 

   22:        0.001     0.406     91.629 

 

 

  

Figure 4.    SVD reconstitution of the Cheetah with  2, 4, 6, 8, 10, 12, 20, 30, 40   principal axes



End of Chapter 1: Singular Values Decomposition