Example A.3: EX_A03.MultCorAnalysis (MCA)

(Multiple Correspondence Analysis)


Example A.3 aims at describing a set of categorical variables through MCA.

The corresponding data are located in the subdirectory: EX_A03.MultCorAnalysis of the directory DtmVic_Examples_A_Start .

The data are an excerpt from a “sample survey about living conditions and aspirations of the French” [carried out by the CREDOC ( www.credoc.fr ) in 1986]. They deal with the responses of a (small) subset of 315 individuals to 49 questions. Some questions concern objective characteristics of the respondent or his/her household (age, status, gender, facilities). Other questions relate to attitude or opinions.

The principal axes visualization will be complemented with a clustering, including an automatic description of the clusters. The importance of the dichotomy: Active variables - Supplementary variables is stressed.




1) Looking at both files: dictionary and data.


1.1) Dictionary file:

To have a look at the dictionary, search for the examples directory DtmVic_Examples_A_Start , and, in that directory, open the directory of example A.3 named “EX_A03.MultCorAnalysis” .

Open then the dictionary file: “MCA_Eng_dic.txt (for a dictionary in French, open: “MCA_Fr_dic.txt” ).

The dictionary file MCA_Eng_dic.txt contains the identifiers of the 51 variables. About the internal data and dictionary format of DtmVic, please refer to the first examples above, or to the Introduction of Tutorial D, devoted to data importation.


1.2) Data file:

That data file comprises 315 rows and 50 columns (identifier of rows [between quotes] + 49 values [corresponding either to numerical variables or to item numbers of categorical variables] separated by at least one blank space).



2) Generation of a command file (or: “parameter file”)

Open DtmVic.


2.1) Click the button: “Create a command file” of the main menu: Basic Steps, section: “ Command File”.

A window “Choosing among some basic analyses” appears.


2.2) Click then the button: “MCA– Multiple correspondence analysis”

– located in the paragraph: “Numerical data” .


2.3) Click the button: “Open a dictionary (Dtm format)”

To open the dictionary, search again for the examples directory DtmVic_Examples_A_Start” , and, in that directory, open the directory of example A3, named “EX_A03.MultCorAnalysis” . Open then the dictionary file: “MCA_Eng_dic.txt (for a dictionary in French, open: “MCA_Fr_dic.txt” ). The dictionary file in displayed in a window. Another window indicates the status of each variable (numerical or categorical).


2.4) Click the button: “Open a data file (Dtm format)”

Open the data file “MCA_dat.txt”.

A new window displays the data file.


2.5) Click the button: “Continue (select active and supplementary elements)”.

A new window is displayed, allowing for the selection of active variables.

We suggest to select the following set of categorical variables as active variables [of course, the reader is free to select another set of categorical variables]


Suggested set of active categorical variables (sample of opinions)


8 . family_is_the_only_place..
9 . opinion_about_marriage
10 . house_work
11 . satisfaction_dwelling
12 . satisfaction_envir.
21 . headache
22 . backache
23 . nervousness
24 . depression
25 . health_satisfaction
34 . society_needs_changes?
48 . About_justice
49 . People_like_me_feel_alone

Suggested set of supplementary variables (socio-demographic characteristics)


3 . gender
50 . Age_categ
51 . Educ_3_categ

The active categorical variables are in this case 13 questions (opinions and attitudes) about family, housing expenditure, society, health problems, and anxiety.

Click the button: “Continue”

A new window devoted to the selection of active observations (rows) is displayed. Click on the button: “All the observations will be active”.


2.7 The window “Create a starting parameter file” is displayed.


2.7.1 Click on: “1) Select some options”.

A new window entitled “Options Bootstrap and/or clustering of observations” is displayed. Click “yes” for the “Bootstrap validation”, and then, click “Enter” for confirming the default number of replicates (25). Ignore the other suggested bootstrap options.

Select then the number of clusters (we suggest 5) then click on: “Enter” and on: “Continue” .

Back to the previous window,


2.7.2 Click on: “2) Create a parameter file for MCA”.

A parameter file is displayed in the memo [such a parameter can be edited by advanced users. It allows for performing again the same analysis later on, if needed].

Important: The parameter file is saved as “Param_MCA.txt” in the current directory.

If you wish, you could now exit from DtmVic, and, later on, use the button of the main menu “Open an existing command file” (line: “Command file” ) to open directly “Param_MCA.txt” , and, in so doing, reach this point of the process.


2.7.3 Click then on: “3) Execute”.

This step will run the basic computation steps present in the command file: archiving data and dictionary, selection of active elements, multiple correspondence analysis of the selected table, bootstrap replications of the table, brief description of the axes, clustering procedure with a thorough descriptions of clusters.

After the execution has taken place, a small window summarizes the different steps of computation.



3) Basic numerical results


Click “Basic numerical results”button

The button opens a created (and saved) html file named “imp.html” which contains the main results of the previous basic computation steps. After perusing these numerical results, return to the main menu. Note that this file is also saved under another name. The name “imp.html” is concatenated with the date and time of the analysis (continental notation): “imp_08.07.09_14.45.html” means July 8 th , 2009, at 2:45 p.m. That file keeps as an archive the main numerical results whereas the file “imp.html” is replaced for each new analysis performed in the same directory.

This file in also saved under a simple text format, under the name “imp.txt” , and likewise with a name including the date and time of execution.
Return.




4) Steps VIC (Visualization, Inference, Classification)



4.1) Click “AxeView” button … and follow the sub-menus. In fact, only three tabs are relevant for this example: “Active variables” , “Supplementary categories” and “Individuals (observations)” . After clicking on “View” for each case, one obtains the set of principal coordinates along each axis.

Clicking on a column header produces a ranking of all the rows according to the values of that column. In this particular example, this is somewhat redundant with the printed results of the step “DEFAC” (see files “imp.txt” or “imp.html” through the buttons “Basic numerical results” ). The use of the AxeView menu is profitable when the data set is very large.
Return.


4.2) Click PlaneView Research button … and follow the sub-menus. In this example, six items of the menu are relevant “Active columns (variables or categories)”, “Supplementary categories”, “Rows (individuals, observations)” , “Active columns + Rows” , “Rows, Individuals (density)” and “Active columns + Supplementary categories” .

The graphical displays of the selected pairs of axes are then produced.

In the “Rows, Individuals (density)” , the identifiers of individuals are replaced by a single character [case of very large set of individuals... useful to detect and identify outliers]. This display shows mainly the shape of the cloud of individuals, but the original identifiers can be produced by clicking the right button of the mouse. All the displays concern the planes spanned by the chosen pairs of axes.

The roles of the different buttons are straightforward, except perhaps the button: “Rank” , which is useful only in the case of very intricate displays, (which is far from being the case here!): this button converts the two coordinates of the current display into ranks. For instance, the n values of the abscissa are converted into n integers, from 1 to n, having the same order as the original values. Thus the two distributions are uniform, and the identifiers turn out to be much less overlapping, and more legible (at the cost of a substantial distortion of the display). This example is in fact a counterexample of that property: MCA derived from a few active categorical variables produces a lot of superimposed points, that are perfectly superimposed in the display of “individuals” and slightly different in the display of ranks (according to the option chosen here, they occupy consecutive or neighbouring ranks).
Return.



4.3) Click “BootstrapView” button.

This button opens the DtmVic-Bootstrap-Stability windows.


4.3.1 Click on: “LoadData”. In this case (partial bootstrap), the two replicated coordinates file to be opened are named “ngus_var_boot.txt” and “ngus_sup_cat_boot.txt” (look at the small panel reminding the names of the relevant files below the menu bar).

In fact, in this version, the file ngus_var_boot.txt contains both active and supplementary categories. The file ngus_sup_cat_boot.txt contains only supplementary categories, for which the bootstrap procedure is more meaningful.


4.3.2 Click on: “Confidence Areas” , submenu, and choose the pair of axes to be displayed (select axes 1 and 2 [default option] to begin with).


4.3.3 In the window that appears then, displaying the dictionaries of variables, tick the chosen white boxes to select the elements the location of which should be assessed, and press the button “Select” .


4.3.4 Click on: “Confidence Ellipses” to obtain the graphical display of the active category points (in blue colour), and of the supplementary category points (in red).

In this display, we learn for example that in this principal space (built as a “space of opinions”, due to the selection of active questions), male and female [two supplementary categories that did not participate in building the axes] occupy distinct locations (ellipses no overlapping at all).

To test such a hypothesis (independence between the pattern of opinions and the gender) it is convenient (i.e. more legible) to tick only the two categories “male” and “female”.

In the same vein, we can tick the classes of ages, and observe that the extreme categories (“under 30” and “over 60” correspond to confidence ellipses clearly separated).


4.3.5 Close the display window, and, press “Convex hulls” . The ellipses are now replaced by the convex hulls of the replicates for each point. The convex hulls take into account the peripheral points, whereas the ellipses are drawn using the density of the clouds of replicates. The two pieces of information are complementary.
Go back to the main submenu “VIC”.



4.4. Click “ClusterView”


4.4.1 Choose the axes (1 and 2 to begin with), and “Continue” .


4.4.2 Click on the button “View” . The centroids of the 5 clusters appear on the first principal plane (Steps RECIP and PARTI of the created command file “Param_MCA.txt” , i.e.: Clustering using reciprocal neighbours (RECIP), then cut of the dendrogram and optimization of the cut through k-means (PARTI) ).


4.4.3 Activate the button “Categorical”, and, pointing with the mouse on a specific cluster, press the right button of the mouse. A description of the cluster involving the most characteristic response items appears. This description is somewhat redundant with that of the Step DECLA (see files “imp.txt” or “imp.html” using the button “Basic numerical results” ). But we do have in front of us the pattern of clusters and their relative locations. One can easily imagine the usefulness of the tool for a survey with 3000 individuals, hundreds of variables, and, say, 20 clusters.

In the context of this example, the other items of the main menu are not relevant.



General remark.

As you can observe when looking at the content of the example’s directory, several files have been created and saved [these files are briefly described in the memo “Help about files” in the toolbar of the main menu]. If you need to continue using again the buttons of the paragraph VIC of the main menu after having closed DtmVic, just click on the button “Open an existing command file” from the line “command file” , select and open the saved command file: “Param_MCA.txt” , and close it. It is not necessary to click on: “execute” again. You can then continue your investigation (axes views, graphs, maps, etc.).

The advanced users can also edit the parameter file “Param_MCA.txt” , (using the memo “Help about parameters” in the toolbar of the main menu) to perform a new analysis in which the parameters are given new values. All the intermediate files will be replaced (except the file “imp_date_time.txt” which is the only saved archive)




End of example A3 (MCA)