Example A.2: EX_A02.SimpleCorAnalysis (SCA)

(two-way Correspondence Analysis)


Example A.2 aims at describing a contingency table through Correspondence Analysis (CA).

DtmVic generates several intermediate text-files related to the application. It is recommended to use one specific directory for each application.

The small data table of Example A.2 ( SCA _dat_Eng.txt ) comes from a “multi-media sample survey” (carried out by the CESP in 1992 [about the CESP, see: www.cesp.org ]). It describes the distribution of six media (Radio, Television, National and regional diaries, magazines, TV magazines) among eight socio-economic categories of respondents (first eight rows). The six media are the columns, the eight categories being the rows of the contingency table. The cell (i, j) of the contingency table contains the number of contacts, during the previous day, between respondents belonging to category i and media j. Some supplementary rows give the number of contacts according to three new categorical variables: gender, age, educational level.




1) Looking at the two files: dictionary and data.


In the example directory “ DtmVic_Examples_Start” , the sub-directory of example A.2 is named “EXA02.SimpleCorAnalysis” . At the outset, such directory must contain at least two files:


a) the dictionary file,

b) the data file.


To look at these files, use your text editor outside DtmVic (Notepad, Notepad++, Ultraedit, TotalEdit) or simply a text editor within DtmVic: button "Open an existing command file" of the main menu.

1.1) Dictionary file:

The dictionary name is “SCA_dic_Eng.txt” . ( “SCA_dic_Fr.txt” for a French version)

This particularly simple example of dictionary file contains the identifiers of the 6 categories that are the columns of the contingency table. In this internal format of DtmVic, the identifiers of categories must begin at: “column 6” [a fixed interval font - also known as teletype font - such as “courier” should be used to facilitate the use of this kind of format].


1.2) Data file:

In a similar fashion, open the data file “SCA_dat_Eng.txt” . ( “SCA_dat_Fr.txt” for a French version)

The data file “SCA_dat_Eng.txt” comprises 8 rows and 7 columns. Each row contains the identifier of rows [between quotes] + 6 values corresponding to the absolute frequencies of 6 media-categories, separated by at least one blank space.



2) Generation of a command file (or: “parameter file”)


2.1) Click the button: “Create” of the main menu “Basic Steps”, line “Command File”.

A window “Choosing among some basic analyses” appears.


2.2) Click then the button : “SCA – Simple correspondence analysis” – located in the paragraph “Numerical data” .


2.3) Click the button “Open a dictionary (Dtm format)”

To open the dictionary, search for the examples directory DtmVic_Examples_Start , and in that directory, open the directory of example A.2 named “EXA02.SimpleCorAnalysis” . Open then the dictionary file: “SCA_dic_Eng.txt” (or: “SCA_dic_Fr.txt”. The dictionary file in displayed in a window. Another window indicates the status of each variable (all the variables have the status: “numerical” in this case).


2.4) Click the button “Open a data file (Dtm format)”

Open the data file “SCA_dat_Eng.txt” (or: “SCA_dat_Fr.txt ” for the French version).

A new window displays the data file (The button “more data” is of no use in this case of small sized data set).


2.5) Click the button “Continue (select active and supplementary elements)”

A new window is displayed, allowing for the selection of active variables. In this simple case, we should select all the variables in the “memo” named “Variables to be selected” , and tick the upper blue arrow to give to the selected subset the status of “active variables” (no supplementary variables in this example).


2.6) Click the button “Continue”

A new window devoted to the selection of active observations (rows) is displayed. Click on the button: “ The observations will be selected from a list” . Select then the first eight rows (occupations) as “active observations” (upper blue arrow) and the remaining rows as “supplementary observations”. Click then on “Continue”.


2.7) The window “Create a starting parameter file” is displayed.


2.7.1 Click on the button: “1) Select some options” .

A new window entitled “Options Bootstrap and/or clustering of observations” is displayed. Click “yes” for the “Bootstrap validation”, and then, click “Enter” for confirming the default number of replicates (25). Ignore the suggested bootstrap options. Click “Enter” for 0 cluster, and click then on “Continue”.


2.7.2 Back to the previous window, click on the button: “2) Create a parameter file for SCA” .

A parameter file is displayed in the memo [That parameter file can be edited by the advanced users. It allows for performing again the same analysis later on, if needed].


Important: The parameter file is saved as “Param_SCA.txt” in the current directory. If you wish, you could now exit from DtmVic, and, later on, use the button of the main menu “Open an existing command file” (line: “Command file” ) to open directly “Param_SCA.txt” , and, in so doing, reach this point of the process, using the “Execute” command of the main menu..


2.7.3 Click then on the button: “3) Execute”.

The basic computation steps mentioned in the command file are: archiving data and dictionary, selection of active elements, correspondence analysis of the selected table, bootstrap replications of the table to build confidence regions for column-points and row-points, brief description of the axes. After the execution has taken place, a small window summarizes the different steps of computation.



3) Basic numerical results


Click “Basic numerical results” button.


The button allows the user to browse a created (and saved) html file named “imp.html” which contains the main results of the previous basic computation steps. After perusing these numerical results, “return” to the main menu. Note that this file is also saved under another name. The name “imp.html” is concatenated with the date and time of the analysis (continental notation): “imp_08.07.09_14.45.html” means July 8 th , 2009, at 2:45 p.m. That file keeps as an archive the main numerical results whereas the file “imp.html” is replaced for each new analysis performed in the same directory.

This file in also saved under a simple text format , under the name “imp.txt” , and likewise with a name including the date and time of execution.
Return.




4) Steps VIC (Visualization, Inference, Classification)


4.1) Click the “AxeView” button

and follow the sub-menus. In fact, only two tabs are relevant for this first simple example: “Active variables” and “Individuals (observations)”. After clicking on “View” in both cases, one obtains the set of principal coordinates along each axis.

Clicking on a column header produce a ranking of all the rows according to the values of that column. In this particular example, this is somewhat redundant with the printed results of the step “DEFAC” printed in the log-file “imp.txt”.

Evidently, the use of the AxeView menu makes sense when the data set is very large.
Return.



4.2) Click the “PlaneView Research” button

and follow the sub-menus.

In this example, only three items are relevant “Active columns (variables or categories)” , “Active rows (individuals, observations)” , “Active columns + Active rows” (respectively columns, rows of the data table, and simultaneous representation of rows and columns). The graphical display of chosen pairs of axes are then produced.

The roles of the different buttons are straightforward, except perhaps the button: “Rank”, which is useful only in the case of very intricate displays, (which is far from being the case here!): this button converts the two coordinates of the current display into ranks. For instance, the n values of the abscissa are converted into n integers, from 1 to n, having the same order as the original values. Thus the two distributions are uniform, and the identifiers turn out to be much less overlapping, and more legible (often at the cost of a substantial distortion of the display).
Return.



4.3) Click the “BootstrapView”/FONT> button< ...

This button opens the DtmVic-Bootstrap-Stability windows.


4.3.1 Click on “LoadData” . In this case (partial bootstrap), the replicated coordinates file to be opened is named “ngus_var_boot.txt” .

4.3.2 Click on: “Confidence Areas”, submenu, and choose the pair of axes to be displayed (select axes 1 and 2 [default option] to begin with).


4.3.3 In the window that appears then, displaying the dictionaries of variables, tick the chosen white boxes to select the elements the location of which should be assessed, and press the button “Select” .


4.3.4 Click on “Confidence Ellipses” to obtain the graphical display of the column points (or variable points) in red colour, and of the row points (or individuals or observations) in blue.


In this display, we learn for example that all the occupation groups (row points) have distinct “media-contact-profiles”, except the categories “Skilled worker” and “Unskilled worker” on the one hand, and “Skilled worker” and “Employees” on the other, whose confidence areas are largely overlapping.


4.3.5 Close the display window, and, again in the blue window, press “Convex hulls” . The ellipses are now replaced with the convex hulls of the replicates for each point. The convex hulls take into account the peripheral points, whereas the ellipses are drawn using the density of the clouds of replicates. The two pieces of information are complementary…

In the context of this example, the other items of the main menu are not relevant.



General remark.


As you can observe when looking at the content of the example’s directory, several files have been created and saved [these files are briefly described in the memo “Help about files” in the toolbar of the main menu]. If you need to continue using again the buttons of the paragraph VIC of the main menu after having closed DtmVic, just click on the button “Open an existing command file” from the line “command file” , select and open the saved command file: “Param_SCA.txt” , and close it. It is not necessary to click on: “execute” again. You can then continue your investigation (axes views, graphs, maps, etc.).


The advanced users can also edit the parameter file “Param_SCA.txt” , (using the memo “Help about parameters” in the toolbar of the main menu) to perform a new analysis in which the parameters are given new values.. All the intermediate files will be replaced (except the file “imp_date_time.txt” which is the only saved archive)



End of example A2 (SCA)