Updated version of "Multivariate Descriptive Statistical Analysis", by L. Lebart, A. Morineau, K. Warwick, John Wiley, New York, 1984.
Correspondence Analysis
1. Introduction
Correspondence analysis can be presented in several different ways. It is difficult to trace the method's history accurately (see, e.g., Hill, 1974 ; Benzecri, 1976 ; Nishisato, 1980). The underlying theory probably dates back to Fisher's (1936) work on contingency tables. However, since Benzecri's (1965, 1973) work, the emphasis has been mostly on the algebraic and geometric properties of the descriptive tool provided by the method.
This analysis is not simply a special case of principal components analysis, although it is possible to use that technique by performing appropriate transformations on the variables provided each space is treated separately. Correspondence analysis can also be viewed as finding the best simultaneous representation of two data sets that comprise the rows and columns of a data matrix .
Correspondence analysis and principal components analysis are used under different circumstances:
- Principal components analysis is used for tables consisting of continuous measurements.
- Correspondence analysis is best applied to contingency tables (cross-tabulations). By extension, it also provides a satisfactory description of tables with binary coding.
We use a concrete example : a small contingency table cross-tabulating eye and hair colors (Snee, 1974).
We see how the structure of the data itself dictates the ways in which the configuration of points is constructed, the distance is chosen, and a goodness of fit criterion is selected.
Table 1
Snee (1974 ) Eye-hair color data
|
|
Hair color |
|
|||
|
|
Black |
Brunette |
Red |
Blonde |
Total |
Eye color |
Brown |
68 |
119 |
26 |
7 |
220 |
Hazel |
15 |
54 |
14 |
10 |
93 |
|
Green |
5 |
29 |
14 |
16 |
64 |
|
Blue |
20 |
84 |
17 |
94 |
215 |
|
|
Total |
108 |
286 |
71 |
127 |
592 |
2. Coordinates, masses, metric
Construction of the configurations of points
First, let us consider the space Rp, that is R4, where the matrix is represented by 4 hair- color points. The data matrix K, has n rows and p columns ; k represents the number of individuals with eye color i and hair color j.
Notation
Let ki. = Sj kij (1)
be the total number of persons with eye color i.
Let k.j = Si kij (2)
be the total number of persons with hair color j,
and let k = Sij kij be the sample total. (3)
We take as the jth component of the ith vector of Rp :
kij / ki. for all i = 1,2,...n (4)
We call the ith component of the jth vector of Rn
kij / k.j for all j = 1,2,...p (5)
Note a fundamental difference between correspondence analysis and principal components analysis : here the transformations performed on the raw data in the two spaces are identical (because the data sets being placed in correspondence play similar roles). But they are different analytically : the table of new coordinates in Rp is not the simple transpose of the new coordinates in Rn (whereas in principal components analysis, the same analytical formula is derived from very different geometrical transformations).
We note the relative frequencies as follows :
fij = kij / k ( Sj Si fij = 1 ) (6)
and, in a similar fashion,
fi. = Sj fij = ki. / k (7)
f.j = Si fij = k.j / k (8)
From now on, we deal almost exclusively with relative frequencies.
Choice of distances
Having chosen profiles (which are conditional frequencies) to construct the configuration of points, and wishing to obtain somewhat invariant results, we adopt a distance that is different from the customary Euclidean distance.
The distance between two eye-colors i and i' is given by
(10)
In symmetric fashion, the distance between two hair colors j and j' is written
(11)
The distance thus defined is called the chi-square distance (or c2 distance).
In fact, this distance only differs from the usual Euclidean distance in that each square is weighted by the inverse of the frequency corresponding to each term.
Essentially, the reason for choosing the chi-square distance is that it verifies the property of distributional equivalency, expressed as follows :
1. If two columns having identical profiles are aggregated, then the distances between rows remain unchanged.
2. If two rows having identical distribution profiles are aggregated, then the distances between columns remain unchanged.
The property is important, because it guarantees a satisfactory invariance of the results irrespective of how the variables were originally coded.
From a strictly technical point of view, it is logical to consider two points that are overlapping in space as a single point corresponding to the totals of the two aggregated categories.
Let us demonstrate this property in the case of aggregating two eye-colors i1 and i2 into one i0 , whose relative frequency fi0 satisfies the relationship.
In expressing the distance d(j,j') between two hair-colors j and j', only two terms, called A(i1) and A(i2) make use of i1 and i2 .
After aggregation they are replaced by A(i0) such that
Note that :
and that A(i0) can be written likewise. The quantities between bracketsare equal. Let’s all them B.
It is then easy to see that:
Goodness of fit criterion
In constructing the configurations of points in the spaces Rp and Rn , choosing the profiles as coordinates gives masses to every hair color and eye color equally.
In order to calculate goodness of fit, it is natural to give each point a mass that is proportional to its frequency, so as not to overrepresent categories with small totals, and consequently to reflect the real population distribution.
Point i of Rp thus has a mass of fi. , whereas the mass of point j of Rn is f.j .
Summary
We always designate by K the (n, p) matrix of the frequencies, and by F the matrix of relative frequencies :
F = (1 / k) K
We designate by Dp the (p, p) diagonal matrix whose jth diagonal element is f.j ; and by Dn the (n, n) diagonal matrix whose ith element is fi.. With these two matrices and matrix F, we complete our summarization of the initial elements of the analysis (Table 2).
Table 2: Coordinates, goodness of fit criteria, distances
Cloud of n row-points In space Rp |
¬ Basic ® elements |
Cloud of p column-points In space Rn |
p coordinates (row-point i )
|
Analysis of table X |
n coordinates (column-point j )
|
|
with the metric M |
|
mass of point i : |
and the criterion N |
mass of point j : |
3. Fitting the data points
We limit our presentation to one space, Rp, since the proofs in the other space are deduced by permutation of the indices i and j (that is, by transposition of F, and substitution of the matrices Dn and Dp ).
We wish to represent graphically the distances between profiles. Therefore we place ourselves, in both spaces, at the centers of gravity (mean point) of the configurations of points.
However, and this is one of the peculiarities of correspondence analysis, one can perform the analysis either relative to the origin or to the center of gravity (provided that, in the former case, the principal axis connecting the origin to the center of gravity is ignored).
Analysis in Rp , Calculation of factors
In this space the n points are the n rows of Dn-1F. Let u be a unit vector for the distance in Rp , that is, such that :
u'Dp-1u = 1.
The vector of the n projections on axis u is written :
Dn-1FDp-1u = v
The quantity to be maximized is the weighted sum of squares :
v' Dnv = u'Dp-1F' Dn-1F Dp-1u
with the constraint u'Dp-1u = 1.
Therefore u is the eigenvector of S = F' Dn-1F Dp-1
corresponding to the largest eigenvalue l : S u = l u
The general term sjj' of S is written:
S is not symmetric but the problem can easily be reduced to finding the eigenvectors and eigenvalues of a symmetric matrix.
Vector u is the first principal axis.
Vector f = Dp-1u is called the first factor.
Factor f is an eigenvector of the matrix A = Dp-1 F'Dn-1 F.
The projections of the n points on the principal axis u are the components of
Dn-1 F Dp-1 u = Dn-1 F f (12)
More generally, if ur is the eigenvector of S corresponding to the eigenvalue lr, ur is the rth principal axis ; fr = Dp-1 ur is the rth factor, and the projections of the n points on axis ur are the components of Dn-1 F fr .
Relationship with analysis in Rn
Similarly, we must maximize in Rn : v'Dn-1F Dp-1 F' Dn-1v
with the constraint v' Dn-1v = 1.
The first principal axis v is therefore the eigenvector of the matrix
Q = F Dp-1 F' Dn-1
corresponding to the largest eigenvalue.
Let us rewrite the equation Su = l u :
F' Dn-1F Dp-1u = l u
Let us premultiply the two sides by F Dp-1 :
F Dp-1 F' Dn-1 ( F Dp-1u) = l ( F Dp-1u)
It appears that v is proportional to F Dp-1u.
Since the Dn-1-norm of F Dp-1u is equal to l, and since we must have:
v' Dn-1v = 1,
we must set:
(13)
In analogous fashion :
(14)
Vector y = Dn-1v is likewise called the first factor.
Table 3: Matrices to be diagonalized and coordinates in both spaces
In Rp |
¬ Spaces ® |
Dans Rn |
|
Matrix to be diagonalized |
|
|
Principal axes |
|
|
Coordinates |
|
The projections of the p points on the principal axis v are the components of
Dp-1F' Dn-1v = Dp-1F' y (15)
For any a, if va is the eigenvector of Q corresponding to the eigenvalue la, va is the ath principal axis ; ya = Dn-1va is the ath factor, and the projections of the p points on axis va are the components of Dp-1F' ya .
The relationship (13) and (14) can be written, using the factors instead of the principal axis (and restoring the index a):
(16)
(17)
These last equations -transition equations - show that the projections of points on the principal axes in one space (as given by formulas 12 and 15) are proportional to the components of the factors relating to the other space).
Transition relationships are written explicity for the factors:
Thus, if we ignore the
coefficient , the representative points of one
space are, on each axis a, the barycenters of the
representative points of the other space.
We have met such Transition relationship previously, in the chapter devoted to Principal Component Analysis. But here, the interpretation of such relationships is much more straightforward, since the involved coefficients are positive, with sum equal to 1. Moreover, we can present directly Correspondence Analysis as a technique giving, in some sense, the best simultaneous representation of rows and columns of a contingency table (next section).
4 Properties of Correspondence Analysis; Other approaches
Finding the best simultaneous representation:
Let us consider again the contingency table crosstabulating eye and hair colors. We wish to represent, on the same axis, the entire sets of hair colors and eye colors so as to approximate the following situation:
- a) Each hair color point j is a barycenter of eye color points, the latter being weighted by their percentages in the concerned hair color j. In other words, by the following masses:
mi = fij / f.j ( Si mi = 1 )
- b) Each eye color i is simultaneously a barycenter of the hair colors, these hair colors being weighted by their percentages in the profile of the eye color i. This leads to the system of masses:
m'j = fij / fi. ( Sj m'j = 1 )
This ideal situation is usually impossible, because it implies that each set is entirely contained within the range of the other.
If we designate by fj and yi the coordinates of hair color j and eye color i on the axis, we would have the (impossible) strictly barycentric equations:
yi = Sj (fij / fi.) fj ( or, in matrix notation: y = Dn-1Ff )
fj = Si (fij / f.j) yi (or, in matrix notation: f = Dp-1F' y )
To realize simultaneously these two sets of relationships, we must introduce a coefficient b, as close as possible to 1, such that the following equations are true:
y = b Dn-1F f and f = b Dp-1F' y
The coefficient b is necessarily greater than 1, otherwise, the last equations would be all the more impossible. We must therefore find the smallest b compatible with the above equations.
If we replace, in the second equation, y by its value drawn from the first one:
f = b2 Dp-1F'Dn-1F f
f is therefore an eigenvector of Dp-1F'Dn-1F
relative to the largest eigenvalue l = 1 / b2
Since b must be greater or equal to 1, we have established that the eigenvalues l issued from C.A. are less than or equal to 1.
We note also that a particular solution is given by:
f = 1p , y = 1n (jj = 1 for all j, yi = 1 for all i) and b = 1.
This solution is often designated as the "trivial solution" in C.A.
The presentation of C.A. as a best simultaneous representation can be extended to more than one dimension, taking into account that the barycentric relationships are valid for each axis separately.
Reconstitution formula (Fisher's identity)
We can write equation (16) as follows:
(20)
Suppose p is the smallest dimension of the analysed contingency table.
We can put away the p factors j a and y a in the column of two matrices F and Y , and the eigenvalues l a as diagonal elements of a diagonal matrix L. (Some eigenvalues could be equal to 0, but it does not matter here, the corresponding eigenvectors must only be chosen so as to complete the basis made up by the first ones).
The p equations (20), ( for r = 1, 2, ... p), can be written in a compact form:
Y L 1/2 = Dn-1 F F (21)
Note that F is such that F' Dp F = Ip , hence: F F' = Dp-1
If we post-multiply the two members of equation (21) by F' :
Y L 1/2 F' = Dn-1 F Dp-1
which can be written:
F = Dn Y L 1/2 F' Dp (22)
This formula is known as reconstitution formula or Fisher's identity.
It can be written explicitly:
Taking into account the trivial solution presented previously (first eigenvalue equal to 1), it reads:
This formula was established by Fisher in 1940 [quoted by Maung (1942) ].
Supplementary elements
As it has already been shown in the case of Principal Component Analysis, it proves useful to project afterwards supplementary elements (rows or columns) on the factorial axes or planes.
The components of the j-th supplementary column are written:
The projections of the supplementary columns and rows are obtained from the formulas:
(23)
(24)
Formulas (23) and (24) are derived from formulas (18) and (19) after a slight modification. The transition formulas allow us to project supplementary profiles onto the original axes.
y+ia designates the coordinate of a supplementary row i whose profile is (f +ij / f +i.) on axis a, whereas f +ja designates the coordinate of a supplementary column j whose profile is (f +ij / f +.j) on the same axis.
Other Approaches
Many authors have suggested generalizations or alterations of C.A. to adapt it to particular contexts of application. An excellent review of the corresponding papers can be found in Escoufier (1985).
We will confine ourselves here in mentioning three of them:
The generalization by B.Escofier (1984) used by van der Heijden (1987).
Briefly, the reconstitution formula, instead of :
becomes:
where the vectors g and h and the matrix W are not necessarily those provided by the usual model, but must satisfy some supplementary constraints.
The method suggested by Domenges and Volle(1979) (Spherical factor analysis), which substitutes to the Chi-square metric a Hellinger metric.
We have, for example, the following distance between two rows i and i':
The method proposed by Lauro et al.(1983) (Non-symmetrical correspondence analysis), which leads in particular, with the above notation, to diagonalize the matrix: F' Dn-1F.
In this type of analysis, the criterium of Goodman-Kruskal, more adapted to situations where the two sets in correspondence play non-symmetric roles, is substituted to the classical Chi-square. The various properties of this method can be found in the referenced paper.
5. Interpretation Rules and Numerical Example
The processing of the small contingency table crossing eye and hair colors will serve as an example of the interpretation rules of C.A. and of the usual computer outputs. (Of course, because of its very limited size, such a table does not require to be described through correspondence analysis: a visual inspection would in general suffice.)
The trace and the eigenvalues
Table 4
Eigenvalues, percentages of variance
+--------+------------+----------+----------+
| NUMERO | EIGEN | PERCENT. | Cumulated|
| | VALUE | | PERCENT. |
+--------+------------+----------+----------+
| 1 | .2088 | 89.37 | 89.4 |
| 2 | .0222 | 9.51 | 98.9 |
| 3 | .0026 | 1.11 | 100.00 |
+--------+------------+----------+----------+
| Trace .2336 |
+--------+------------+----------+----------+
The matrix S to be diagonalized has the same eigenvalues (if we except the trivial eigenvalue 1 of S which corresponds to the eigenvalue 0 of S*) as the centered matrix S*, whose general term is written:
The trace of S* is:
tr S* =
The quantity k tr S* is nothing but the classical Chi-squared c2 with (n-1)(p-1) computed on contingency tables.
We note that:
Here, the trace value is 0.239, and k, the total number of observations, is 595. The product of these two quantities is 142.2.
In the case of independence of the rows and the columns of the table, this should be a value of a Chi-squared with (4-1)(4-1) = 9 degrees of freedom. The observed value is therefore highly significant.
We note that the first axis is dominant, the first eigenvalue explaining almost 90% of the total variance. Under the hypothesis of independence of rows and columns of the table, it has been shown (Lebart, 1976) that the quantity kla is approximately distributed like the ath eigenvalue of a Wishart matrix with parameters (p-1) and (n-1). Charts for the largest eigenvalue are published in Lebart and al. (1984). This test is partly irrelevant here, since the hypothesis of independence has already been rejected.
Masses, Distances, coordinates, contributions
The following listing (Table 5) produces the most usual parameters:
The "masses" printed in the first column of the above listing are the quantities fi. and f.j (multiplied by 100).
Table 5
Coordinates and interpretation parameters for the active rows and the active columns
ACTIVE COLUMNS COORDINATES CONTRIBUTIONS SQUARED COSINES
MASS 1 2 3 1 2 3 1 2 3
HAIR
BLACK 18.2 —.50 .21 —.06 22.2 37.9 21.6 .84 .15 .01
BRUNETTE 48.1 —.15 —.03 .05 5.1 2.3 44.3 .86 .04 .09
RED 11.9 —.13 —.32 —.08 1.0 55.1 31.9 .13 .81 .05
BLONDE 21.8 .84 .07 —.02 71.7 4.7 2.2 .99 .01 .00
__________________________________________________________________________
ACTIVE ROWS COORDINATES CONTRIBUTIONS SQUARED COSINES
MASS 1 2 3 1 2 3 1 2 3
EYES
brown 37.0 —.49 .09 —.02 43.1 13.0 6.7 .97 .03 .00
hazel 15.6 —.21 —.17 .10 3.4 19.8 61.1 .54 .34 .12
green 10.8 .16 —.34 —.09 1.4 55.9 31.9 .18 .77 .05
blue 36.6 .55 .08 .00 52.1 11.2 .3 .98 .02 .00
Note that the distances of the row-points to the origin of axes are given by the formulas (where Gp designate the center of gravity in Rp):
In symmetric fashion, for the columns, (where Gn designate the centers of gravity in Rp ):
The so-called "CONTRIBUTIONS" are measurement of the contribution of each element (row-point or column-point) to the total variance explained by each axis.
For example, the contribution of the row-point i (whose coordinates on axis a are fia √la) to axis r is given by: ca (i) = fi. fia2 / Si fi. fia2
Similarly: ca (j) = f.j yja2 / Sj f.j yja2
Note that: Si fi. fia2 = Sj f.j yja2 = 1
and that: Si ca (i) = Sj ca (j) = 1 , for each axis a.
To make them more readable, the values of the contributions are multiplied by 100 on the listings of outputs.
The squared cosines (or squared correlations)
If we note da2(i, G) the squared distance of point i to the center of gravity, on axis a, we have the relationship (pythagorean theorem):
d 2(i,G) = Sa da2(i, G)
The quantity: cosa2(i) = da2(i, G) / d 2(i, G) (squared cosine)
represents the contribution of axis a for explaining the point i.
Note that Sa cosa2(i) = 1 (Sum over the axes)
We can see for instance that the contribution of the hair color "brunette" to the first axis is rather low (5.1 %), but that its squared cosine is high (.86). This means that, although it has not participated much in building the first axis, this point is nevertheless extremely characteristic of this axis.
Graphical display of the two sets of points on the first two axes
The Figure 1 shows the simultaneous representation of the two sets of points (eye and hair colors) on the plane corresponding to axes 1 and 2.
Figure 1. Sketch of the superimposed graphical displays of the row-point
and of the column-points of Table 1
The superimposition of the displays of row-points and column-points provides a help in the interpretations of proximities: the proximity between two row-points can be explained in looking at the location of all the column-points. The transition relationships allow us to interpret the position of one row-point with respect to all column point (and vice-versa).
But it is not straithforward to interpret the proximity between one row-point and one column-point, since these points do not belong to the same space. It is the reason why some users prefer using separate displays.
End of Chapter 4: Correspondence Analysis