Updated version of "Multivariate Descriptive Statistical Analysis", by L. Lebart, A. Morineau, K. Warwick, John Wiley, New York, 1984.

 

 

 

Descriptive Principal Components Analysis

 

1. Introduction

 

There are several different ways of looking at principal components analysis. The classical statistician considers principal components analysis to be the determination of the major axes of an ellipsoid derived from a multivariate normal distribution; the axes are estimated from the sample. It was in this context that Harold Hotelling (1933) originally described principal components analysis. Most historical texts on classical multivariate analysis usually present the topic in this fashion (see Anderson, 1958; Kendall and Stuart, 1968; Dempster, 1969).

 

Psychologists, or more precisely, psychometricians consider principal components analysis as one specific type of factor analysis. Factor analysis, in its many forms, makes a variety of assumptions for dealing with communalities (specific item variance) and error variance. A detailed account of psychometric thinking about factor analysis can be found in Horst (1965), Lawley and Maxwell (1970), and Mulaik (1972), Harman (1976), and Cattell (1978).

 

More recently, data analysts have taken an entirely different point of view about principal components analysis. They have used principal components analysis without making any assumptions about distributions or an underlying statistical model. Rather, they have used it as a technique for describing a set of data in which certain algebraic or geometric criteria are optimized. The availability of high speed digital computers has accelerated the growth of this approach, although it is, in fact, the original method outlined by Karl Pearson (1901). One way of looking at principal components analysis is implicit in Pearson's original formulation, but Rao's (1964) pioneering review article presents the topic in a way most closely allied to the philosophy described in this course.

 

2. Scope of the Method

 

The data analyst is frequently faced with the task of analyzing a rectangular matrix in which the columns represent variables (e.g., anthropometric measurements, psychological test scores, or economic indicators) and the rows represent measurements of specific objects or individuals on these variables.

 

For example, in biometry it is not unusual to collect a variety of measurements on a group of animals, or a sample of organs. Similarly, in economics, data may be collected on a variety of measures related to household expenditures or corporate activity. This data, when displayed in tabular form, is often very unwieldy and difficult to comprehend. The problem is that the information is not readily absorbed because of its sheer volume.

 

The methods to be described in this course are designed to summarize the information contained in tables like those described above and simultaneously to provide the analyst with a clear visual or geometric representation of the information.

 

Principal components analysis in the context of this course differs from correspondence analysis in one important respect. In principal components analysis the columns of the data matrix are generally a set of variables or measurements, whereas the rows are a relatively homogeneous sample of objects or individuals on which measurements have been made. Correspondence analysis, on the other hand, treats the rows and columns of the data matrix in a symmetric fashion (the leading case being a contingency table).

 

Principal components analysis can be considered a theoretically appropriate method for the analysis of data that derives from a multivariate normal distribution. Correspondence analysis, in contrast, is theoretically the more appropriate method for the analysis of data in the form of a contingency table. In succeeding chapters we discuss the analysis of data matrices that do not precisely meet these requirements.

 

3. Applying Principal Component Analysis

 

We now show how to adapt the general analysis presented previously to a situation where the problem is to describe rather than to simply summarize large data sets.

 

Let us consider matrix R whose columns are 300 responses or measurements taken on 2000 individuals. The order of the matrix is then: n = 2000, p = 300. Specifically, let us assume that the data consist of values of 300 types of annual expenditures made by 2000 individuals.

We would like to understand how the 300 expenditures are related to one another as well as whether similarities exist among the behaviour patterns of the individuals.

 

Figure 1. Location of the cloud of point in Rp

 

 

3.1. Analysis in Rp

 

In the variable space we attempt to fit the set of n points, first in a one-dimensional, then in a two-dimensional subspace so as to obtain, on a graphical display, the most faithful possible visual representation of the distances between the individuals with respect to their p expenditures.

 

The problem is no longer one of maximizing the sums of squares of the projections of the points' distances from the origin. What now has to be maximized is the sum of squares of the distances between all the pairs of individuals. In other words, the best fitting line H1 is no longer limited to passing through the origin as was Ho in the previous chapter.

 

Figure 2. Projection onto H1.

Let hi and hj represent the values of the projections of two individual points i and j on H1, We have the following equation:

 

with:


The summation is for all i’s and j’s, and  h designates the mean of the projection of the individuals, which happens to be also the projection of the mean-point G of the individuals in the full space, whose coordinates are given by:

                                                            

Thus :              

 

If the origin is placed in G, the quantity to be maximized becomes again the sum of squares of the distances to the origin, which brings us to the problem discussed above (Chapter 0).

 

The required subspace is obtained by first transforming the data matrix R into a matrix X whose general term is  and then perform the analysis on X.

 The distance between two individuals k and k' is written, in Rp

 

 

In this summation there might be some values of the index j for which corresponding variables are of very different orders of magnitude: for example expenditures on stamps, expenditures on rent. It may be necessary in some cases, particularly when the units of measurement are different, to give each variable the same weight in defining the distances among individuals. Normalized principal axes analysis is used in this situation. The measurement scales are standardized by using the following distance measure:

 

In this formula,        is the standard deviation of variable j:

 

Finally, we note that the normalized analysis in Rp of the raw data matrix R is also the general analysis of X, whose general term is:

 


In this space, we must therefore diagonalize the matrix C = X'X. The generic term of this matrix is written:

             

That is :          

cjj’ is the correlation coefficient between variables j and j'.  The coordinates of the n individual points on the principal axis ua (the ath eigenvector of matrix C) are the n components of the vector va = Xua.

The abscissa of the individual point i on this axis a  is explicitly written as

 

3.2. Analysis in Rn

 

The general analysis developed in chapter 1 shows that fitting a set of points in one space implies fitting the other set in the other space as well.

The purpose of transforming the initial matrix R was twofold: first, to obtain a fit that accounts in the best possible way for the distances between individual points; second, to give equal weights to each of the variables in defining the distances among the individuals.

We note that the equation transforming R into X:


 does not treat the rows and columns of the initial matrix R in a symmetric fashion.

 

What is the meaning of the distance between two variables j and j  in Rn if the columns of the transformed matrix X are used as coordinates for the variables?

Let us calculate the Euclidean distance between two variables j and j ':

                                                  

that is :                                                         


When we substitute for xij its value taken from the equation transforming R into X, taking in account the relationship:

                                                       

we see that :                                  

 

(All the variable points are located on a sphere of radius 1, whose centre is at the origin of the axes.) We also see that


We obtain the relationship of the distance between two variable points j and j' and these variables' correlation coefficient cjj':

                                                       d2(j, j') = 2 (1 - cjj')                                                     

therefore :

                                                          0 ≤ d2(j, j') ≤ 4

The distances between the transformed variables have the following characteristics (see figure 3):

1. Two variables that are highly correlated are located either very near one another (cjj' close to 1.0) or as far away as possible from one another (cjj' close to -1.0), depending on whether they are positively or negatively correlated.

2. Two variables that are orthogonal (uncorrelated) (cjj'= 0) are at moderate distance from one another.

 

                                         

 

                cjj' » 1                                      cjj' » 0                                          cjj' » -1

       d(j,j') » 0                                d(j,j') » Ö2                                     d(j,j') » 2

Figure 3. Correlations and distances between “variable-points”

 

Note that the analysis in the space Rn  is not performed relative to the centroid (mean point) of the variable points (whereas in the space Rp, the analysis is performed relative to the mean point of individuals)..

 

 

We have seen that it is unnecessary to diagonalize matrix XX', of order (n, n) once the eigenvalues la and the eigenvectors ua of matrix C = X'X are known. The reason is that the vector

 

is the unit eigenvector of XX' relative to the eigenvalue la..

The abscissas of the variable points on the axis are the components of

 


and are by-products of the computations that have already been performed in the space Rp.

 

Note. The cosine of two variable vectors in space Rn is the correlation coefficient between the two variables. If the two variables are located at a distance of 1 from the origin (i.e., if they have unit variance), the cosine is their scalar product.

Thus the p correlation coefficients that exist among the coordinates of the points on an axis and the p variables are precisely the p components of the previous vector:

 


The abscissa of a variable point on an axis is the correlation coefficient between this variable and the new variable (the linear combination of the initial variables) that constitutes the coordinates of the individuals in the corresponding axis.

 

It is important to note that the interpretation of the equation transforming R into X differs greatly in the two spaces Rn and Rp.

 

Consider, for instance, the operation of centering the variables:

 

1. In Rp the transformation  is equivalent to translating the axes' origin to the centre of gravity (or centroid) of the points.

2. In Rn the transformation is a projection parallel to the first bisector of the axes. (The general term of the (n, n) matrix P that is associated with this transformation is the classical idempotent matrix associated with the operation of centering.)

 

4) Supplementary Variables and Supplementary Individuals

 

It often happens, in practice, that additional information is available that might be added to matrix R. We might have some data on the n individuals that could be added to the p variables being analyzed but that is of a somewhat different nature than the p variables. For example, we might wish to add income level and age of the individuals to a data set consisting of food consumption variables.

 

On the other hand, we may have data on the p variables for a number additional individuals who belong to a control group, and who therefore cannot be included in the sample.

Matrix R thus may have additional rows and columns:

 

1. A matrix R + with n rows and ps columns added to its columns.

2. A matrix R+ with n s rows and p columns added to its rows.

 

Figure 4. Supplementary rows and columns

 

The matrix R++, in which both individuals and variables are supplementary is unnecessary for the analysis; see Figure 4.

Matrices R + and R+  are transformed, respectively, into matrices X + and  X+ in order to make the new rows and columns comparable to those of  X.

 

(a) In Rn, we have ps supplementary variable points. In order to remain consistent in interpreting inter-variables distances in terms of correlations, the following transformation must be performed (normalized principal analysis):

 

We simply compute the new means and the new standard deviations relative to the supplementary variables, in order to place these supplementary variables on the sphere of unit radius.

 

The abscissas of the ps supplementary variables on this axis are therefore the ps components of the vector   X+'va .

 

(b) In Rp, the placement of the supplementary individuals in relation to the others consists of positioning them relative to the centroid of the points (which has already been calculated) and then dividing the coordinates by the standard deviations of the variables (which have already been calculated for the n individuals). Therefore the following transformation is performed:

 


The projection operator on axis a of Rp is still the unit vector ua.

 

 

The variables or individuals involved directly in fitting the data are called active elements (active variables or active individuals). The supplementary elements (variables or individuals) are called either illustrative or supplementary elements.

 

 

5) Nonparametric Analysis

 

The only difference between nonparametric PCA methods and what we have already described is that they require a preliminary transformation of the data. These techniques should be used when the data are heterogeneous. They provide extremely robust results, and also lend themselves to statistical interpretation.

 

5.1  Analysis of Ranks.

Let us transform the original data matrix into a matrix of ranks. In this matrix an observation i on a variable j is qij ,  the rank of observation i after the n observations have been ranked. Under these circumstances the distance between two variables j and j' is define by the formula

 

 

The reader will recognize that (1 - d2(j, j')) is Spearman's rank-correlation coefficient.

Rank-order analysis is best applied in the following situations:

 

(1) The basic data consists of ranks, in which case there is no choice.

(2) The variables' scales of measurement are so different from another that principal components data reduction operations remain inadequate. However, this procedure is not a remedy for asymmetric distributions. Finally, analyzing a set of ranks is easier to justify than analyzing extremely heterogeneous set of measurements.

(3) The implicit assumptions applied to the measurements are weak, and consequently less arbitrary. The distribution of the distances is nonparametric; confidence levels, as in any nonparametric test, are dependent only upon the hypothesis that the observations are distributed continuously, which is more plausible in this context than the assumption of normality.

(4) Finally, this method is robust, in the sense that it is insensitive to the presence of outliers, often an appreciable advantage.

 

The results are interpreted in the same way as a regular principal components analysis. The reason is that a regular principal components analysis is performed after transformation into ranks. Note that standardizing variables is unnecessary here, since all of the ranks have the same variance. The distance between two variables is interpreted in terms of correlation: two variables are close to one another if their ranks are similar across all the observations. Conversely, two variables are distant from another if their ranks are almost totally opposite. Two observations are close when their ranks are similar for each of the variables. When variables and observations are mapped simultaneously, the relative location of a variable vis-à-vis all of the observations gives an idea of the configuration of the observations' rank for this variable.

Finally, the nonparametric nature of the results allows us to perform some tests of validity on the eigenvalues. This is because the rules governing eigenvalues of a matrix of ranks depend only on the parameters n and p number of rows and columns of the matrix. It is therefore possible to construct tables to determine the significance levels of the eigenvalues.

 

5.2.   Robust Axes Analysis.

 The least squares criterion is best applied to a normal distribution. When the distribution is uniform, excessive weight is given to observations at both extremes. In this case, the analysis becomes more robust if we apply a transformation that "normalizes" the uniform distribution of the ranks.

Let us consider the kth observation among the n ranked observations.

Let F be the distribution function for a normal distribution. We substitute the rank observation k with the value yk obtained from the inverse distribution function of the normal distribution:

                                                         

Figure 5. Transformation of a uniform distribution into a normal distribution

 

This type of transformation can be found in Fisher and Yates (1949); see Figure 5.

For a large n, the transformation is equivalent to replacing the kth observation with the expected value of the kth observation in a ranked sample of  n normal observations.

 

6). Example of outputs for principal components analysis

 

Data set : Example “EX_A01, in the set of example DtmVic_Examples_A_Start attached to the software DtmVic.

 

The CESP (Centre d'Étude des Supports de Publicité, a joint industry committee for the audit of audience measurements ) has carried out a time-budget sample survey in France during the years 1991/1992. Sample size : 17 665

 

Variables : measurement of media contacts (Radio, Television, Press (magazines, daily newspapers) and the duration of 60 activities the day before the interview.

 

Description of Table- 1

Columns: 16 active continuous variables : minutes per day, identifiers in French)

Rows: 27 fictitious individuals that are in fact groupings of respondents according to the age, the genderr, educational level, size of town :

Identifiers of the 27 rows:

- The  first character is the age     (1=young, 2=medium, 3=old)

- The 2nd character (gender) is always 1 (active males in this sub-sample)

- The 3rd character is the educational level (1=primaire, 2=secondaire, 3=supérieur)

-  The 4th character is the type of town (1= rural areas; 2= small towns; 3= large towns;

-  4= Paris and suburbs;    5,6,7 = other categories).

 

 

Table -1.  Daily time-budget p = 16 activities (columns)  for  n = 27 groups of  active males (rows).

IDENT     Somm    Repo    Reps   Repr    Trar     Ména   Visi   Jard   Lois   Disq   Lect   Cour  Prom    A pi   Voit    Fréq

 

1111      463.8    23.8   107.3    4.8    300.0    21.3   51.0   82.3   10.0    1.2     .0   41.3   6.9    7.1    52.1   135.8

1115      515.6    58.5   102.7   10.4    208.8    41.9   30.0   32.9    2.1    4.6     .6   33.7   8.3   24.6    29.4   225.8

1121      463.3    34.2    84.8   17.1    298.3    18.1   37.8   55.8   18.4    5.9    2.6   30.7   5.9    8.8    56.7   135.8

1122      456.4    43.1    74.2   21.9    239.0    26.0   51.2   59.7   18.4    3.6    4.6   52.2   9.5   10.8    72.7   142.3

1123      478.0    44.2    76.7   15.2    212.3    22.3   42.0   43.7   18.4    2.3    6.4   48.3  14.7   15.5    72.8   167.7

1124      465.1    41.6    85.2   23.7    226.0    37.0   42.5   16.3   10.7    8.7    9.4   44.3  13.7   19.8    59.0   145.1

1136      458.4    47.4    94.7   15.1    314.3    25.3   39.1   42.4   16.9     .9   16.7   34.5   4.6    6.4    61.5   103.4

1133      457.2    30.7    82.0   26.2    269.8    52.1   37.6   35.6   25.6    6.0    8.0   42.8  10.4   12.0    81.4   107.6

1134      465.2    40.2    78.6   31.1    268.6    36.3   21.6    4.0   19.4    6.0   14.8   46.9  10.7   21.9    48.3    82.4

2111      449.0    42.1    86.2    7.9    312.5    15.1   16.1  112.9   15.4     .0    2.2   32.1   7.6    8.1    60.1   153.9

2112      450.2    63.1    86.7    9.8    249.6    40.4   55.6   83.3    3.0    2.2     .0   45.0   9.4   10.4    61.9   145.4

2117      455.2    47.4    95.6    9.0    250.8    30.4   13.5   57.3    7.9    2.9    7.0   52.2  15.1   15.7    49.1   194.8

2121      461.9    39.3    90.3    8.5    323.5    14.9   21.7   81.8   15.4    1.2    5.3   26.0   3.8    7.4    59.6   130.8

2122      453.7    44.7    97.5   18.7    269.0    23.1   39.6   93.5    3.1    3.4   12.1   42.0  12.1   10.6    62.4   129.1

2123      433.1    49.8    91.7   12.6    283.7    22.4   21.0   62.9   13.1    6.2    7.3   38.1  11.6   11.7    47.6   168.6

2124      438.3    32.8   102.3   11.1    338.3    28.0    6.5   64.8   13.8    1.4   19.8   34.9   7.4   14.1    53.2   130.5

2131      457.7    44.0    87.9    6.9    313.0    24.4   23.2   63.8    9.2     .6   11.8   30.0   7.3    7.5    69.7   108.3

2132      455.0    47.0    78.9   31.6    380.6    23.9    7.5   40.0   13.0     .0   10.3   23.3   1.4    9.4    59.4   100.0

2133      467.3    37.5    86.9   21.9    264.0    40.8   27.6   33.4   11.9    1.6   10.8   45.3   6.7   10.7    72.8   135.2

2134      433.5    35.6    76.1   17.1    355.0    34.1   13.4   31.7   12.6    3.2   13.2   37.5   8.5   22.3    57.5    96.5

3116      473.0    51.5    99.3    6.3    356.3    21.2   27.6   82.1    8.6     .0    1.5   35.7  13.4    7.1    40.6   107.7

3117      461.9    60.0   103.7    9.1    240.5    35.3   14.5   83.4    1.4    2.0    7.4   46.1   5.7   16.6    53.3   183.7

3121      453.4    45.6    86.2    7.8    358.7    12.9   18.5   54.4    4.2     .0    4.9   34.3   3.3   10.3    48.7   143.1

3122      485.1    53.5    86.0     .3    222.4    24.7   23.2   91.9    8.5     .0    3.7   52.9   7.1    9.9    75.3   166.3

3123      456.7    43.2    94.6   12.1    265.3    30.5   23.7   61.1    9.1    2.3   11.6   50.1  17.6   13.2    46.3   185.3

3136      444.2    53.6    90.7    7.2    302.4    31.7   16.4   97.6    4.7    2.4    4.3   38.8  13.6   11.4    61.8   127.2

3137      438.4    50.7    81.0   11.2    306.6    19.3   23.8   10.5   13.6     .0   18.4   67.6   8.3   18.6    63.1   143.3

 

 

Table 2, Summary, (number of « individuals » :  27)

 

 

 IDEN — Label                                     Mean    std-    MINIMUM     MAXIMUM

                                                          dev.                    

Active Variables

 

 Somm — Sommeil  (sleep)                         458.91   16.47     433.10    515.60

 Repo — Repos  (rest)                             44.63    8.90      23.80     63.10

 Reps - Repas chez soi  (meals at home)           89.18    8.90      74.20    107.30

 Repr - Repas restaurant (restaurant)             13.87    7.82        .30     31.60

 Trar - Travail rémunéré (work)                  286.27   46.75     208.80    380.60

 Ména - Ménage (housexwork)                       27.90    9.29      12.90     52.10

 Visi - Visite à amis  friends)                   27.64   13.26       6.50     55.60

 Jard - Jardinage, Bricolage (gardening, DOY)     58.49   27.39       4.00    112.90

 Lois - Loisirs extérieur (leisure)               11.42    5.95       1.40     25.60

 Disq - Disque cassette (records)                  2.54    2.32        .00      8.70

 Lect - Lecture livre (reading)                    7.95    5.47        .00     19.80

 Cour - Courses  démarches (errands)              40.99    9.47      23.30     67.60

 Prom - Promenade (stroll)                         9.06    3.88       1.40     17.60

 A pi - Déplacement à pied (walk)                 12.66    5.01       6.40     24.60

 Voit - Déplacement en Voiture (ride)             58.38   11.29      29.40     81.40

 Fréq - Fréquentation Média (media contacts)     140.58   32.56      82.40    225.80

______________________________________________________________________

 

Supplementary numerical variables

 

 Autr — Autres activités  (other)                 12.71    5.70       2.10     25.90

 Domi — Total Domicile  (total home)             928.73   49.92     826.00   1034.00

 Tdep — Total Déplacement (total transport)       88.45   14.65      67.50    122.10

IDEN — Label                                     Mean    std-    MINIMUM     MAXIMUM

                                                          dev.                    

Supplementary numerical variables (continuation)

 

 Habitudes Cinema   (cinema)                        .14     .14        .00       .60

 Habitudes Radio  (radio)                          1.92     .23       1.49      2.64

 Habitudes Télévision  (telly)                     3.20     .37       2.13      3.90

 Habitudes Presse Quotidienne (daily press)         .18     .14        .03       .53

 Habitudes Presse magazine (magazine)              3.56     .74       2.00      5.31

 Habitudes Hebdomadaires News  (news)               .31     .18        .00       .67

 

 

 

Table 3 : Correlation Matrix  and eigenvaluess

 

  Sommeil |   1.00                                      Correlation matrix

  Repos   |    .21   1.00

  Repas c.|    .21    .10   1.00

  Repas r.|   —.08   —.30   —.53   1.00

  Travail |   —.52   —.28   —.02   —.01   1.00

  Ménage  |    .20    .08   —.01    .39   —.46   1.00

  Visites |    .27   —.08   —.07    .10   —.47    .15   1.00

  Jardin. |   —.09    .19    .43   —.64    .08   —.37   —.02   1.00

  Loisirs |   —.17   —.61   —.55    .52    .10   —.01    .12   —.39   1.00

  Disques |    .07   —.17   —.15    .52   —.46    .50    .30   —.42    .25   1.00

  Lecture |   —.44   —.21   —.15    .38    .24    .08   —.36   —.51    .27   —.01   1.00

  Courses |   —.04    .18   —.17   —.03   —.56    .23    .24   —.24   —.01    .08    .18   1.00

  Promen. |    .00    .09    .04   —.02   —.45    .27    .18   —.01   —.05    .40   —.03    .48   1.00

  A pied  |    .17    .15   —.14    .28   —.38    .49   —.18   —.62   —.09    .48    .27    .37    .30   1.00

  Voiture |   —.19   —.22   —.55    .21   —.15    .10    .27    .03    .44   —.09    .15    .23   —.11   —.33   1.00

  Fréq.med|    .40    .42    .37   —.44   —.62    .05    .01    .18   —.45    .07   —.38    .30    .28    .28   —.33   1.00

 

+——————+—————————+———————+———————+           Histogram of the 16 eigenvalues

|NUMBER|  Eigen  |PERCEN |PERCEN |

|      |  value  | TAGES |(CUMU) |                                                                                 

+——————+—————————+———————+———————+—————————————————————————————————————————————————————————————

|   1  |   3.871 | 24.20 | 24.20| ******************************************************//*****

|   2  |   3.660 | 22.88 | 47.07| *************************************************//********    

|   3  |   2.006 | 12.54 | 59.61| ******************************************                                      

|   4  |   1.514 |  9.47 | 69.08| ********************************                                                

|   5  |   1.126 |  7.04 | 76.12| ************************                                                        

|   6  |    .837 |  5.23 | 81.35| ******************                                                              

|   7  |    .766 |  4.79 | 86.15| ****************                                                                

|   8  |    .596 |  3.73 | 89.87| *************                                                                   

|   9  |    .444 |  2.78 | 92.65| **********                                                                       

|  10  |    .374 |  2.34 | 94.99| ********                                                                        

|  11  |    .246 |  1.54 | 96.53| ******                                                                          

|  12  |    .222 |  1.39 | 97.92| *****                                                                           

|  13  |    .161 |  1.01 | 98.93| ****                                                                            

|  14  |    .114 |   .72 | 99.64| ***                                                                             

|  15  |    .037 |   .23 | 99.88| *                                                                                

|  16  |    .019 |   .12 |100.00| *                                                                               

 

 

 

 

 

 

 

 


 

 

Table 4  Coordinates of active variables (axes 1 to 3)

 

Variables                 Coordinates            Unitary axes

                           1      2      3         1     2     3  

 Sommeil                  .22   -.52    .18       .11  -.27   .13 

 (sleep)

 Repos                    .46   -.40   -.17       .23  -.21  -.12 

 (rest)

 Repas chez soi           .67   -.15   -.23       .34  -.08  -.17 

 (meals at home)

 Repas restaurant        -.84    .00   -.07      -.43   .00  -.05 

 (restaurant)

 Travail rémunéré         .05    .88   -.34       .03   .46  -.24 

 (work)

 Ménage                  -.40   -.57   -.08      -.20  -.30  -.06 

 (house work)

 Visite à amis           -.13   -.33    .73      -.07  -.17   .52 

 (calling friends)

 Jardinage, Bricolage     .76    .22    .35       .39   .11   .25 

 (gardening, DOY)

 Loisirs extérieur       -.72    .30    .30      -.37   .16   .21 

 (leisure out)

 Disque  cassette        -.53   -.53    .01      -.27  -.27   .01 

 (records, cassettes)

 Lecture livre           -.54    .24   -.50      -.27   .12  -.36 

 (reading books)  

 Courses  démarches      -.21   -.54    .11      -.11  -.28   .08 

 (Errands)

 Promenade               -.10   -.58    .04      -.05  -.30   .03 

 (walk, stroll, ride)

 A pied                  -.37   -.62   -.57      -.19  -.33  -.40 

 (walk)

 En Voiture              -.41    .22    .65      -.21   .11   .46 

 (ride)

 Fréquentation Média      .49   -.68   -.05       .25  -.36  -.03 

 (contacts media)

 

 

Figure 6.  16 active variables in the plan spanned by axes 1 and 2

 

 

 

 

Table 5  Coordinates of supplementary variables

(or : illustrative) on axes 1 to 3

   VARIABLES                  Coordinates

                            1      2      3

 Autres  activités        .08    .16    .04

 Total Domicile           .67   -.50   -.21

 Total Déplacement       -.72    .05    .14

 Habitudes Cinema        -.87   -.11   -.14

 Habitudes Radio         -.27   -.57    .07

 Habitudes Télévision     .04   -.55    .34

 Habitudes Presse Quot   -.39    .01   -.70

 Habitudes Presse mag    -.24   -.38   -.26

 Habitudes Hebdo-News    -.46    .20   -.48

 

Figure  7. Plot of des supplementary  variables (same plan as  figure 3)

 

 


Table 6. Coordinates, contributions and squared cosines of individuals on axes 1 and  2

 

 INDIVIDUAL           COORDINATES     CONTRIBUT.   SQUARED COS.

 IDENTIF.   DISTO       1      2        1    2       1    2   

 1111       19.89     2.01    .85     3.8   .7     .20  .04  

 1115       47.51     2.26  -5.11     4.9 26.4     .11  .55  

 1121       10.55     -.71   1.01      .5  1.0     .05  .10  

 1122       13.29    -1.86   -.64     3.3   .4     .26  .03  

 1123       14.49    -1.28  -1.81     1.6  3.3     .11  .23  

 1124       19.06    -2.72  -2.93     7.1  8.7     .39  .45  

 1136       10.68     -.56   1.97      .3  3.9     .03  .36  

 1133       27.04    -4.21   -.30    17.0   .1     .66  .00  

 1134       25.35    -4.29   -.91    17.6   .8     .73  .03  

 2111       12.86     1.91   2.12     3.5  4.5     .28  .35  

 2112       17.27     1.43  -1.68     2.0  2.8     .12  .16  

 2117       10.89     1.03  -2.16     1.0  4.7     .10  .43  

 2121       10.96     1.27   2.55     1.5  6.6     .15  .59  

 2122        7.92      .62   -.21      .4   .0     .05  .01  

 2123        8.33      .30   -.33      .1   .1     .01  .01  

 2124       15.54     -.12   2.06      .0  4.3     .00  .27  

 2131        7.39      .55   2.03      .3  4.2     .04  .56  

 2132       24.45    -1.17   3.53     1.3 12.6     .06  .51  

 2133        7.85    -1.63   -.11     2.5   .0     .34  .00  

 2134       17.19    -2.54   1.36     6.2  1.9     .37  .11  

 3116       16.19     2.68    .96     6.9   .9     .45  .06  

 3117       15.96     2.43  -1.84     5.7  3.4     .37  .21  

 3121       13.00     1.90   2.11     3.4  4.5     .28  .34  

 3122       17.31     2.12   -.95     4.3   .9     .26  .05  

 3123       10.26      .56  -1.74      .3  3.1     .03  .30  

 3136        9.09     1.56    .09     2.3   .0     .27  .00  

 3137       21.68    -1.55    .08     2.3   .0     .11  .00  

 

Table 7. Test –values and Coordinates of supplementary categories on axes 1 and 2

 

       

                  Categories              Test_values      Coordinates

     Wording                 Number    Axe 1    Axe 2     Axe 1   Axe 2  

 

        . Age  

 

   A-35 - Jeunes                  9      -2.3   -1.6      -1.26   -.87

   A+35 - Age-Moy                11        .3    1.8        .15    .83

   A+50 - Ages                    7       2.1    -.3       1.39   -.18

 

        . Niveau d’éducation                                        

 

   prim - primaire                7       3.0   -1.5       1.96   -.98

   seco - secondaire             11        .0    -.2        .01   -.08

   supe - superieur               9      -2.8    1.6      -1.54    .86

 

        . Agglomération (excerpts)                                 

 

   AGG1 - - de 20 000             6       1.6    2.5       1.15   1.78

   AGG2 - de 20 a 100 000         5        .3     .0        .23    .01

   AGG3 - Plus de 100 000         5      -1.5   -1.1      -1.25   -.86

   AGG4 - Paris                   4      -2.6    -.1      -2.42   -.11

 

 

 

acp_indiv

 

Figure 8.  Location of individuals (symbols with 4 digits)  and categorical variables in the PCA  principal plan  (axes 1 and 2)

 



End of Chapter 2: Principal Components Analysis