Université de Neuchâtel,
Master in Statistics, 2009 – 2010, Multivariate Analysis (Ludovic
Lebart) (Chapter 1)
Updated version of "Multivariate Descriptive Statistical Analysis", by L. Lebart, A. Morineau, K. Warwick,
John Wiley, New York, 1984.
General
Analysis and Singular Value Decomposition
(Compacting a rectangular
array X = (xij) (i ≤ n, j ≤
n) )
In the following chapters, we present a variety of
related methods for analyzing a data matrix, which can be loosely described
under the heading of “principal axes methods”. In this introductory chapter we
develop the theoretical basis common to all of these methods. This stems from
the work of Eckart and Young (1936) on the singular value decomposition of a
matrix. We emphasize the geometrical aspects of this technique.
Problem: Assuming that the
(n,p) matrix X is not a pure random matrix, is it then possible
to summarize the "information" underlying X with
less than np parameters ?
A Reminder : the history of SVD :
1904 : a breakthrough .
Paper of Charles Spearman (1904) – “General intelligence, objectively
determined and measured”.
|
In 1904, Charles Spearman proposed to “explain” the
mark x obtained by individual i (i <= n) to the topic j
(j <= p) by a simple formula involving a unique individual
value f and a series of p coefficients related only to the involved
topics. Thus, thanks to a residual, np values are approximated
by n + p values. Although it was not recognised as such at
that time, it could be the first attempt to perform a “structural compression”
of a data table.
1) - Fitting the data-points in Rp:
The vector space Rp
is a p-dimensional vector space.
Let u be a unit vector of Rp :
u'u = 1
The vector y whose components are
the n projections of the n "row-points" of X on
the one-dimensional subspace defined by u is the vector:
y = Xu
Figure 1
Fitting the cloud of points
The sum of squares of these
projections, y'y, reads :
y'y =
(Xu)' Xu = u' X'X u
To determine u , we must therefore find the
maximum of the quadratic form
{u' X'X u}
under the constraint
u'u = 1
Let u1 be the solution.
The best two-dimensional subspace is bound to
contain u1 (exercise...)
Let us find u, such
that u' u1 = 0
and u'u = 1
which maximizes the quadratic
form {u' X'X u}
Let u2
designates the vector u satisfying these conditions.
This proof is easily extended to the best
q-dimensional subspace, which turns out to be spanned by
u1, u2,
.... uq-1,
and u such that
u'ua = 0 ( a = 1, 2, 3, ... q-1) and u'u
= 1
such that {u' X'X u} is
maximized.
This is a classical mathematical problem :
ua is the a -th eigenvector of the symmetric matrix X'X
corresponding to the eigenvalue la.
Proofs
Problem 1
Maximizing {u' X'X u} with u'u = 1
Let us compute the vector of the partial
derivatives of the Lagrangian f(u) with respect to u
:
f(u) = u' X'X
u - l ( u'u - 1)
We must have :
2
X'X u - 2 l u
= 0
or
:
X'X u = l u
with l = u' X'X u since
u'u = 1
Therefore the maximum is necessarily an
eigenvalue of X'X , and the absolute maximum u1
corresponds to the largest eigenvalue l 1 of this matrix.
Problem 2
Maximizing {u' X'X u}
with u'u
= 1
and u'u1 = 0
Let us compute the derivative of the quantity
f(u) = u' X'X
u - l ( u'u - 1) - m ( u'u1)
and let us write again :
We get :
2 X'X u - 2
l u -
2 m u1 = 0
Premultiplying the two elements of this equation
by u'1 implies
that m = 0,
this leaves X'X u
= l u
with always l = u' X'X u since
u'u = 1
l will be the second eigenvalue l2 of the
matrix X'X , and u = u2 the
corresponding eigenvector .
More generally (exercise):
The best q-dimensional subspace is generated by
the first q eigenvectors of the matrix :
X'X
corresponding to the q largest eigenvalues
l 1, l 2, ..... l q
In the space Rp up to now dealt with, the considered
vectors were the n rows of the matrix X.
If we consider now the space Rn , generated by the p columns of the matrix X,
shall we obtain similar results ?
Which relationship exists between the fits
performed in these two spaces ?
2) - Best fit in the vector
space Rn:
Let v be a unit vector
of Rn
: v'v = 1
The vector z of the p
projections of the p column-points of X (row-points of the
transposed matrix X') onto the axis v is the vector:
z = X'v
The sum of squares of the
projections, z'z, is written :
z'z = (X'v)'
X'v = v' XX' v
The condition :
Maximizing { v' XX' v } with
v'v = 1
defines again the best one-dimensional
subspace, (in a least squares context), in the space Rn.
In analogous fashion, it can be shown that the
best q-dimensional subspace in a least squares sense is then generated by
v1, v2,
.... vq-1,
and v such that
v'va = 0 ( a = 1, 2, 3, ... q-1)
and v'v = 1
and such that the quantity {v' XX' v}
is maximum.
Therefore :
va is the a -th eigenvector of
the symmetric matrix XX' corresponding to the eigenvalue l'a.
3) - Relationships between
the fits in Rn and in Rp:
In Rn : X
X' v = l' v
Premultiplying the two sides by the matrix X'
:
X' X (X' v) = l' (X' v)
But we have already obtained :
X' X u = l u (with l maximum)
u is therefore proportional to the
vector X' v .
Since l is
maximum: l' ≤ l
Similarly, starting from the last equation, we
conclude that the vector v must also be proportional
to the vector X u , which leads to the new inequality :
l ≤ l'
In the end :
l = l' ,
and for every axis a :
la = l'a
We have proved thus the proportionality of :
v and Xu, on the one hand, and
u and X' v on the other.
va = k(a) X ua
ua = k'(a) X' va
The proportionality coefficients k(a) and k'(a) can be computed using the fact that ua and va are unit vectors, and
using also the following relationships :
la = ua' X'X ua = va' XX' va
This leads immediately to the equations :
which induces the so-called transition
formulae :
The original data array X and its transposed
serve as transformation matrices linking the axes of best fit in the two vector
spaces Rp and Rn .
The coordinates of the projections on
these axes in the two spaces Rp and Rn are
respectively the rows of the vectors
X ua and X' va
The relationships
pinpoint the fact that the coordinates in
one space are proportional to the component of the axis in the other
space.
4) Recreating the Original Data
We always call ua the ath eigenvector
of the matrix X'X, which has norm 1, and which corresponds to the
eigenvalue la; and we call va the a-th eigenvector of norm 1 of XX'.
We postmultiply the two elements of this equation by ua , and sum
over all of the axes (some of which may correspond to a zero eigenvalue; they
complement the orthonormal basis formed by the preceding axes):
Let us call U the matrix of order (p, p) whose columns are the
eigenvectors ua of X'X. Since these vectors are
orthogonal and of norm 1, we have U'U = I (identity matrix), and
therefore UU' = I.
However,
Reminding that the eigenvalues la are always
ranked in descending order, the above formula becomes
This formula may be considered a formula for recreating matrix X
on the basis of the eigenvalues and their associated eigenvectors.
Figure 2. Reconstitution of X: Singular values decomposition.
This formula represents the singular value decomposition of X (cf. Sylvester, 1889, for square matrices, and Eckart and Young, 1936,1939, for rectangular matrices).
If the p - q smallest eigenvalues are " very small," the summation
can be limited to the first q terms:
If q is much smaller than p, it is easy to see the advantage by
comparing the two elements of this equation. The np terms of X are therefore
approximated by terms constructed from the q( n + p) values contained on the
right side of the equation.
The recovery of the original data can be evaluated by the quantity
Again, we have
(tr is the trace). By substituting X and X* with their values in the
previous equations, we obtain:
The quantity tq, which is less
than or equal to 1, is called the percentage of variance relative to the q
first factors. Its interpretation as a measure of the numerical quality of the recovery
is clear, but we see later that the problem of its statistical significance is
a sensitive issue.
5) Generalization : Metric M and Criterion of
fit N
Fitting the data-points in Rp:
M is a (p, p) symmetric definite
positive matrix allowing for the definition of a scalar product (i.e.: a
metric) in the space Rp. N is a
diagonal matrix of positive weight.
Let u be a unit vector
of Rp Hence, u
is such that: u'Mu = 1.
The vector y whose p components are
the n “M-projections” of the n "row-points" of X
on the one-dimensional subspace defined by u is the vector:
y = XMu
The weighted sum of squares of these
projections, y'Ny, reads :
y'Ny = (XMu)'
NXMu = u' MX'NXMu
To determine u , we must therefore find the
maximum of the quadratic form
{u' MX'NXMu}
under the constraint
u'Mu = 1
This is almost the same classical mathematical
problem : ua is the a-th eigenvector of the M-symmetric matrix X'NXM
corresponding to the eigenvalue la.
Proof
Maximizing {u' MX'NXM u} with u'Mu = 1
Let us compute the vector of the partial
derivatives of the Lagrangian f(u) with respect to u
:
f(u) = u' MX'NXM
u - l ( u'Mu - 1)
We must have :
2 MX'NXM u -
2 l Mu
= 0
or
:
X'NX Mu = l u
or, putting: f = Mu
: MX'NX f = l f
f = Mu is called
the "factor", projection operator onto the axis u.
f belongs to the dual vector space of Rp,
that contains u.
If M = I, the two spaces ( Rp,and its dual space)
coincide.
f is a unit vector for the dual metric M-1
.
Similarly, l is the quantity to be maximized : l = u' MX'NXMu since
u'Mu = 1
Figure 3. Grayscale photograph ( 200x300 pixels)
143 144 133 130
143 153 159 175 192
201 188 162 135 116
101 106 118
123 112 116 130 143
147 162 183 166 135
123 120 116 116 129
140 159
133 151 162 166 170
188 166 128 116 132
140 126 143 151 144
155 176
160 168 166 159 135
101 93 98 120
128 126 147 154 158
176 181 181
154 155 153 144 126
106 118 133 136 153
159 153 162 162 154
143 128
159 153 147 159 150
154 155 153 158 170
159 147 130 136 140
150 150
151 144 147 176 188
170 166 183 170 166
153 130 132 154 162
120 135
155 181 183 162 144
147 147 144 126 120
123 129 130 112 101
135 150
126 120 143 145 162
153 155 175 154 144
136 130 120 112 123
123 144
144 159 155 155 162
166 158 147 140 147
126 123 132 135 136
144 147
136 143 162 175 136
110 112 135 120 118
126 151 150 130 129
133 147
133 151 143 106
85 93 128 136
140 140 144 143 126
117 116 129 124
……………………………..etc.
List of the eigenvalues
(variant of SVD – Correspondence Analysis)
eigenvalues
1:
0.045 28.549 28.549
***********************************/../**
2:
0.028 17.695 46.243 ******************************
3:
0.019 12.205 58.448
*********************
4:
0.012 7.306 65.754
************
5:
0.007 4.674 70.428
********
6:
0.006 3.516 73.944 ******
7: 0.005
2.944 76.888 *****
8:
0.003 2.179 79.067 ***
9:
0.003 1.869 80.936 ***
10:
0.002 1.531 82.467 **
11:
0.002 1.371 83.838 **
12:
0.002 1.106 84.944 *
13:
0.002 1.066 86.010 *
14:
0.002 0.956 86.965 *
15:
0.001 0.791 87.756 *
16:
0.001 0.758 88.514 *
17:
0.001 0.690 89.204 *
18:
0.001 0.567 89.771
19:
0.001 0.554 90.325
20:
0.001 0.477 90.801
21:
0.001 0.422 91.223
22:
0.001 0.406 91.629
Figure 4. SVD
reconstitution of the Cheetah with 2, 4, 6, 8, 10, 12, 20, 30,
40 principal axes
End of Chapter 1: Singular Values Decomposition