Example A.2 aims at describing a contingency table through Correspondence
Analysis (CA).
DtmVic generates several
intermediate text-files related to the application. It is recommended
to use one specific directory for each application.
The dictionary name is
“SCA_dic_Eng.txt” .
(
“SCA_dic_Fr.txt” for a French version)
This particularly simple example of dictionary
file contains the identifiers of the 6 categories that are the
columns of the contingency table. In this internal format of DtmVic,
the identifiers of categories must begin at: “column 6”
[a fixed interval font - also known as teletype font - such as
“courier” should be used to facilitate the use of this
kind of format].
1.2) Data file:
In a similar fashion, open the data file
“SCA_dat_Eng.txt” .
( “SCA_dat_Fr.txt”
for a French version)
The data file “SCA_dat_Eng.txt” comprises
8 rows and 7 columns.
Each row contains the identifier of rows [between quotes] + 6 values
corresponding to the absolute frequencies of 6 media-categories,
separated by at least one blank space.
2) Generation of a command file (or: “parameter file”)
2.1) Click the button: “Create”
of the main menu “Basic Steps”, line “Command
File”.
A window “Choosing among some basic analyses”
appears.
2.2) Click then the button : “SCA
– Simple correspondence analysis” – located in the
paragraph “Numerical
data” .
2.3) Click the button “Open a dictionary (Dtm format)”
To open the dictionary, search for the examples directory
DtmVic_Examples_Start , and in
that directory, open the directory of example A.2 named
“EXA02.SimpleCorAnalysis” .
Open then the dictionary file:
“SCA_dic_Eng.txt” (or:
“SCA_dic_Fr.txt”. The dictionary file in displayed in
a window. Another window indicates the status of each variable (all
the variables have the status: “numerical” in this case).
2.4) Click the button “Open a data file
(Dtm format)”
Open the data file “SCA_dat_Eng.txt” (or:
“SCA_dat_Fr.txt ” for the French version).
A new window displays the data file (The button “more data”
is of no use in this case of small sized data set).
2.5) Click the button “Continue
(select active and supplementary elements)”
A new window is displayed, allowing for the selection of active variables.
In this simple case, we should select all the variables in the “memo”
named “Variables
to be selected” , and tick the
upper blue arrow to give to the selected subset the status of “active
variables” (no supplementary variables in this example).
2.6) Click the button “Continue”
A new window devoted to the selection of active observations (rows) is
displayed. Click on the button: “
The observations will be selected from a list” .
Select then the first eight rows (occupations) as “active
observations” (upper blue arrow) and the remaining rows as “supplementary
observations”. Click then on “Continue”.
2.7) The window “Create a starting parameter file”
is displayed.
2.7.1 Click on the button: “1) Select some options” .
A new window entitled “Options
Bootstrap and/or clustering of observations”
is displayed. Click “yes”
for the “Bootstrap validation”, and then, click “Enter”
for confirming the default number of replicates (25). Ignore the
suggested bootstrap options. Click “Enter” for 0 cluster,
and click then on “Continue”.
2.7.2 Back to the previous window, click on the button:
“2) Create a parameter file for SCA” .
A parameter file is displayed in the memo [That parameter file can be
edited by the advanced users. It allows for performing again the same
analysis later on, if needed].
Important:
The parameter file is saved as
“Param_SCA.txt”
in the current directory.
If you wish, you could now exit from DtmVic, and, later on, use the button
of the main menu
“Open an existing command file” (line:
“Command file” ) to open directly
“Param_SCA.txt” ,
and, in so doing, reach this point of the process, using the
“Execute” command of the main menu..
2.7.3 Click then on the button: “3) Execute”.
The basic computation steps mentioned in the command file are: archiving
data and dictionary, selection of active elements, correspondence
analysis of the selected table, bootstrap replications of the table
to build confidence regions for column-points and row-points, brief
description of the axes. After the execution has taken place, a small
window summarizes the different steps of computation.
3) Basic numerical results
Click
“Basic numerical results” button.
The button allows the user to browse a created
(and saved) html file named “imp.html”
which contains the main results of the previous basic computation
steps. After perusing these numerical results, “return”
to the main menu. Note that this file is also saved under another
name. The name “imp.html”
is concatenated with the date and time of the analysis (continental
notation): “imp_08.07.09_14.45.html”
means July 8 th , 2009, at 2:45 p.m. That file keeps as an archive the main numerical
results whereas the file “imp.html”
is replaced for each new analysis performed in the same directory.
This file in also saved under a simple text format , under the name
“imp.txt” ,
and likewise with a name including the date and time of execution.
Return.
4) Steps VIC (Visualization, Inference, Classification)
4.1) Click the “AxeView” button …
and follow the sub-menus. In fact, only two tabs are relevant for this first simple example:
“Active variables” and
“Individuals (observations)”. After
clicking on “View”
in both cases, one obtains the set of principal coordinates along
each axis.
Clicking on a column header produce a ranking of all the rows according to the values of
that column. In this particular example, this is somewhat redundant
with the printed results of the step “DEFAC” printed in
the log-file “imp.txt”.
Evidently, the use of the AxeView menu makes sense
when the data set is very large.
Return.
4.2) Click the “PlaneView Research” button …
and follow the sub-menus.
In this example, only three items are relevant “Active
columns (variables or categories)” , “Active rows
(individuals, observations)” , “Active columns + Active
rows” (respectively columns, rows of the
data table, and simultaneous representation of rows and columns).
The graphical display of chosen pairs of axes are then produced.
The roles of the different buttons are straightforward, except perhaps the button:
“Rank”,
which is useful only in the case of very intricate displays, (which
is far from being the case here!): this button converts the two
coordinates of the current display into ranks. For instance, the n
values of the abscissa are converted into n integers, from 1 to n,
having the same order as the original values. Thus the two
distributions are uniform, and the identifiers turn out to be much
less overlapping, and more legible (often at the cost of a
substantial distortion of the display).
Return.
4.3) Click the “BootstrapView”/FONT> button< ...
This button opens the DtmVic-Bootstrap-Stability
windows.
4.3.1 Click on “LoadData” . In this
case (partial bootstrap), the replicated coordinates file to be
opened is named “ngus_var_boot.txt” .
4.3.2 Click on: “Confidence Areas”,
submenu, and choose the pair of axes to be displayed (select axes 1 and 2 [default option]
to begin with).
4.3.3 In the window that appears then, displaying the dictionaries of variables,
tick the chosen white boxes to select the elements the location of which should be
assessed, and press the button “Select” .
4.3.4 Click on “Confidence
Ellipses” to obtain the
graphical display of the column points (or variable points) in red
colour, and of the row points (or individuals or observations) in
blue.
In this display, we learn for example that all the occupation groups
(row points) have distinct “media-contact-profiles”,
except the categories “Skilled worker” and “Unskilled
worker” on the one hand, and “Skilled worker” and
“Employees” on the other, whose confidence areas are
largely overlapping.
4.3.5 Close the display window, and, again in the blue window, press
“Convex hulls” . The ellipses are
now replaced with the convex hulls of the replicates for each point.
The convex hulls take into account the peripheral points, whereas the
ellipses are drawn using the density of the clouds of replicates. The
two pieces of information are complementary…
In the context of this example, the other items of the main menu are not relevant.
General remark.
As you can observe when looking at the content of the example’s
directory, several files have been created and saved [these files are
briefly described in the memo “Help
about files” in the toolbar of
the main menu]. If you need to continue using again the buttons of
the paragraph VIC of the main menu after having closed DtmVic, just
click on the button “Open an existing command file”
from the line “command file” ,
select and open the saved command file: “Param_SCA.txt” ,
and close it. It is not necessary to click on: “execute”
again. You can then continue your investigation (axes views, graphs,
maps, etc.).
The advanced users can also edit the parameter file
“Param_SCA.txt” ,
(using the memo “Help about parameters”
in the toolbar of the main menu) to perform a new analysis in which the
parameters are given new values.. All the intermediate files will be
replaced (except the file “imp_date_time.txt”
which is the only saved archive)
End of example A2 (SCA)