.
A new window displays the data file.
A new window is displayed, allowing for
the selection of active variables.
We suggest to select the following set of categorical
variables as active variables [of course, the reader is free to
select another set of categorical variables]
Suggested set of supplementary variables (socio-demographic characteristics)
3 . gender
50 . Age_categ
51 . Educ_3_categ
|
The active categorical variables are in this case 13
questions (opinions and attitudes) about family, housing expenditure,
society, health problems, and anxiety.
Click the button:
“Continue”
A new window devoted to the selection of active observations (rows) is
displayed. Click on the button:
“All the observations will be active”.
2.7 The window “Create a starting parameter file”
is displayed.
2.7.1 Click on: “1) Select some options”.
A new window entitled “Options
Bootstrap and/or clustering of observations”
is displayed. Click “yes”
for the “Bootstrap validation”, and then, click “Enter”
for confirming the default number of replicates (25). Ignore the other suggested bootstrap options.
Select then the number of clusters (we suggest 5) then click on:
“Enter”
and on: “Continue” .
Back to the previous window,
2.7.2 Click on:
“2) Create a parameter file for MCA”.
A parameter file is displayed in the memo [such a
parameter can be edited by advanced users. It allows for
performing again the same analysis later on, if needed].
Important:
The parameter file is saved as
“Param_MCA.txt”
in the current directory.
If you wish, you could now exit from DtmVic, and, later on, use the
button of the main menu
“Open an existing command file” (line:
“Command file” ) to open directly
“Param_MCA.txt” ,
and, in so doing, reach this point of the process.
2.7.3 Click then on: “3) Execute”.
This step will run the basic computation steps present in the command file:
archiving data and dictionary, selection of active elements, multiple
correspondence analysis of the selected table, bootstrap replications
of the table, brief description of the axes, clustering procedure
with a thorough descriptions of clusters.
After the execution has taken place, a small
window summarizes the different steps of computation.
3) Basic numerical results
Click “Basic numerical results”button
The button opens a created (and saved) html
file named “imp.html”
which contains the main results of the previous basic computation
steps. After perusing these numerical results, return to the main
menu. Note that this file is also saved under another name. The name
“imp.html”
is concatenated with the date and time of the analysis (continental
notation): “imp_08.07.09_14.45.html”
means July 8 th ,
2009, at 2:45 p.m. That file keeps as an archive the main numerical
results whereas the file “imp.html”
is replaced for each new
analysis performed in the same directory.
This file in also saved under a simple text format, under the name
“imp.txt” ,
and likewise with a name including the date and time of execution.
Return.
4) Steps VIC (Visualization, Inference, Classification)
4.1) Click “AxeView” button …
and follow the sub-menus. In fact, only three tabs are relevant for
this example: “Active
variables” , “Supplementary
categories” and “Individuals
(observations)” . After
clicking on “View”
for each case, one obtains the set of principal coordinates along
each axis.
Clicking on a column header produces a ranking of all the rows according to
the values of that column. In this particular example, this is
somewhat redundant with the printed results of the step “DEFAC”
(see files “imp.txt”
or “imp.html” through
the buttons “Basic numerical results” ). The use of
the AxeView menu is profitable when the data set is very large.
Return.
4.2) Click PlaneView Research button … and
follow the sub-menus. In this example, six items of the menu are relevant
“Active columns (variables or categories)”,
“Supplementary categories”, “Rows (individuals, observations)” ,
“Active columns + Rows” ,
“Rows, Individuals (density)”
and “Active
columns + Supplementary categories” .
The graphical displays of the selected pairs of axes are then produced.
In the “Rows,
Individuals (density)” ,
the identifiers of individuals are replaced by a single character
[case of very large set of individuals... useful to detect and
identify outliers]. This display shows mainly the shape of the cloud
of individuals, but the original identifiers can be produced by
clicking the right button of the mouse. All the displays concern the
planes spanned by the chosen pairs of axes.
The roles of the different buttons are straightforward, except perhaps
the button: “Rank” ,
which is useful only in the case of very intricate displays, (which
is far from being the case here!): this button converts the two
coordinates of the current display into ranks. For instance, the n
values of the abscissa are converted into n integers, from 1 to n,
having the same order as the original values. Thus the two
distributions are uniform, and the identifiers turn out to be much
less overlapping, and more legible (at the cost of a substantial
distortion of the display). This example is in fact a counterexample
of that property: MCA derived from a few active categorical variables
produces a lot of superimposed points, that are perfectly
superimposed in the display of “individuals” and slightly
different in the display of ranks
(according to the option chosen here, they occupy consecutive or
neighbouring ranks).
Return.
4.3) Click “BootstrapView”
button.
This button opens the DtmVic-Bootstrap-Stability
windows.
4.3.1 Click on: “LoadData”.
In this case (partial bootstrap), the two replicated coordinates file
to be opened are named “ngus_var_boot.txt”
and “ngus_sup_cat_boot.txt”
(look at the small panel reminding the names of the relevant files
below the menu bar).
In fact, in this version, the file
ngus_var_boot.txt contains both active and supplementary
categories. The file ngus_sup_cat_boot.txt
contains only supplementary categories, for which the bootstrap
procedure is more meaningful.
4.3.2 Click on:
“Confidence Areas” , submenu, and choose the pair
of axes to be displayed (select axes 1 and 2 [default option] to begin with).
4.3.3 In the window that appears then, displaying the dictionaries
of variables, tick the chosen white boxes to select the elements
the location of which should be assessed, and press the button
“Select” .
4.3.4 Click on: “Confidence
Ellipses” to obtain the
graphical display of the active category points (in blue colour),
and of the supplementary category points (in red).
In this display, we learn for example that in this
principal space (built as a “space of opinions”, due to
the selection of active questions), male and female [two
supplementary categories that did not participate in building the
axes] occupy distinct locations (ellipses no overlapping at all).
To test such a hypothesis (independence between the
pattern of opinions and the gender) it is convenient (i.e. more
legible) to tick only the two categories “male” and
“female”.
In the same vein, we can tick the classes of ages, and
observe that the extreme categories (“under 30” and “over
60” correspond to confidence ellipses clearly separated).
4.3.5 Close the display window, and, press
“Convex hulls” .
The ellipses are now replaced by the convex hulls of the replicates
for each point.
The convex hulls take into account the peripheral points, whereas the
ellipses are drawn using the density of the clouds of replicates. The
two pieces of information are complementary.
Go back to the main submenu “VIC”.
4.4. Click “ClusterView”
4.4.1 Choose the axes (1 and 2 to begin with), and
“Continue” .
4.4.2 Click on the button “View” .
The centroids of the 5 clusters appear on the first principal plane
(Steps RECIP and
PARTI of the created command file
“Param_MCA.txt” ,
i.e.: Clustering using reciprocal neighbours
(RECIP), then cut of the
dendrogram and optimization of the cut through k-means
(PARTI) ).
4.4.3 Activate the button
“Categorical”,
and, pointing with the mouse on a specific cluster,
press the right button of the mouse. A description of the cluster
involving the most characteristic response
items appears. This description is somewhat redundant with that of
the Step DECLA
(see files “imp.txt”
or “imp.html” using the
button “Basic numerical results” ).
But we do have in front of us the pattern of clusters and their relative
locations. One can easily imagine the usefulness of the tool for a
survey with 3000 individuals, hundreds of variables, and, say, 20
clusters.
In the context of this example,
the other items of the main menu are not relevant.
General remark.
As you can observe when looking
at the content of the example’s directory, several files have been
created and saved [these files are briefly described in the memo
“Help about files” in the toolbar of
the main menu]. If you need to continue using again the buttons of
the paragraph VIC of the main menu after having closed DtmVic, just
click on the button “Open an existing command file”
from the line “command file” ,
select and open the saved command file:
“Param_MCA.txt” ,
and close it. It is not necessary to click on:
“execute” again. You can then
continue your investigation (axes views, graphs, maps, etc.).
The advanced users can also edit the parameter file “Param_MCA.txt” ,
(using the memo “Help
about parameters” in the toolbar of the main menu) to perform a
new analysis in which the parameters are given new values. All the intermediate
files will be replaced (except the file
“imp_date_time.txt”
which is the only saved archive)
End of example A3 (MCA)