Example B01: EX_B01.Text-Responses_2 (VISURESP).
(Open questions in a sample
survey: Direct analysis of individual reponses)
Example B01 aims at describing
directly the responses to an open ended question in a sample survey, without
prior agglomeration, in relation with a set of categorical variables. The
survey and the responses are the same as in examples A.5 (ANALEX).
More explanation about this type of
example and the corresponding methodology can be found in the book: “Exploring
Textual data” (L. Lebart, A. Salem, L. Berry; Kluwer
Academic Publisher, 1998).
The clusters are described with characteristic
words and responses.
1)
Looking at the text file.
To have a look at the data, search
for the directory: DtmVic_Examples.
In this directory, open the
sub-directory DtmVic_Examples_B_Texts .
In that directory, open the
directory of Example B.01, named: “EX_B01.Text-Responses_Corda2 ” .
It is recommended to use one
directory for each application, since DtmVic produces
a lot of intermediate txt-files related to the application.
Text file: TDA_TEX.txt (same as that of examples A.5)
Let us remind its characteristics.
It contains the free responses of 1043 individuals to three open-ended
questions.
Firstly, the following open-ended
question was asked: "What is the single most important thing in life
for you?" It was followed by the probe: "What other things are
very important to you?".
A third question (not analysed here) has also been asked: “
What means to you the culture of your own country”. We refer
to the previous example (example A.5) for comments about the data format.
2)
Generation of a command file (or: “parameter file”)
2.1)Click
the button: “Create a Command File” of
the main menu: Basic Steps, line: “ Command File”.
A window: “Choosing among some
basic analyses” appears.
2.2) Click then on the
button: “ VISURESP analysis - (Visualization and
Clustering of responses) ” – located in
the section “Textual and numerical data”. A window “Opening a text file” is displayed
2.3) Press the button:“Open the text file”, then search for the directory: “ DtmVic_Examples_B_Texts”
. In that directory,
open the directory of example B.1, named “EX_B01.Text-Responses_Corda” .
Open then the text file: “TDA_tex.txt “.
A message box indicates then that
the corpus comprises 7329 lines, 1043 observations and 3 open questions.
2.4) Click on: “Select Open questions and
separators”
The next window allows for the
selection of open questions and the selection of separators of words (the
default separators suffice in this example).
We will select questions 1 and 2
(that means that the two responses will be merged). It is licit here to merge
the two responses because question 2 is a probe for question 1.
2.5) Click directly on: “Vocabulary and counts” .
The next window presents the
vocabulary (alphabetic and frequency orders). We must select a threshold of
frequency by selecting a line in the right hand side memo frequency order). The
line number 397 [first column] corresponds to the frequency 4 [second column].
(We took a threshold of 16 in the previous example A5. For individual
responses, lexically very poor, it takes more words not to generate too many
empty answers after choosing the threshold). We'll keep the 397 most frequent
words. After selecting that line, click on: “Confirm”.
The frequency appears in a message box. Reply: "OK".
Then click on: “Continue”
.
A window dictionary and data files appears.
2.9) Click then on the
button: “Continue”
A new window devoted to the
selection of active observations (rows) is displayed. Click on the button: “All the observations will be active” .
The window: “Create a starting parameter file” is displayed.
2.10) Then
click directly on: “Create a first parameter file”.
The clustering is automatic, and the number of
clusters is selected (default) depending on the number of responses (30
clusters in this case). [This number of cluster can be changed by editing the
command file (or parameter file) before the execution,
the parameters to be altered belong to the "STEP PARTI"].
A parameter file is displayed in the
memo [It can be edited by the advanced users. It allows for performing again
the same analysis later on, if needed].
Important: The parameter file is saved as “Param_VIRURESP.txt” in
the current directory. If you wish, you could now exit from DtmVic,
and, later on, use the button of the main menu “Open an existing command file”
(Section: “Command file”) to
open directly the file “Param_VIRURESP.txt”,
and, in so doing, reach directly this point of the process, using the “Execute” command of the main menu.
Let us remind that this set of
commands comprises 9 steps:
ARTEX (Archiving texts),
SELOX (selecting the open question),
NUMER (numerical coding of the text),
ASPAR (correspondence analysis of the [sparse]
contingency table “respondents - words”),
CLAIR (Brief description of factorial axes),
RECIP (Clustering using a hierarchical
classification of the clusters - reciprocal neighbours
method),
PARTI (Cut of the dendrogram
produced by the previous step, and optimisation of
the partition obtained),
MOTEX (crosstabulating
the partition produced by step PARTI with
words: the obtained contingency table is called a lexical table),
MOCAR (characteristic words, and
characteristics responses for each class of the partition),
2.11) Click: “Execute”.
This step will run the basic
computation steps present in the command file: archiving data and text,
characteristic words and responses, correspondence analysis of the lexical table,
thorough descriptions of clusters using both words and categorical variables.
7) Click the button: “Basic numerical results”
The button opens a created (and
saved) html file named “imp.html” which contains the main results of the previous
basic computation steps. After perusing these numerical results, return to the
main menu. Note that this file is also saved under another name. The name “imp.html” is concatenated with the date and time of
the analysis (continental notation). That file keeps as an archive the main
numerical results whereas the file “imp.html” is
replaced for each new analysis performed in the same directory.
This file in also saved under a
simple text format, under the name “imp.txt” , and likewise with a name
including the date and time of execution.
Perusing the complete list of words
highlights some errors in the original text file (inevitable in real sized
applications) : for instance, the symbol “]” was
absent from the list of separators, and creates some new “words”…
8) At this stage, we click on one of
the lower buttons of the basic steps panel (Steps: “VIC”)
9) Click the button “AxeView”
... and
follow the sub-menus. In fact, two tabs are relevant for this example: “Active variables” [ = words in the case of the analysis:
""VISURECA"], “Individuals
(observations) [= respondents]”.
After clicking on “View” in each case, one obtains the set of principal
coordinates along each axis.
Clicking on
a column header produce a ranking of all the rows according to the values of
that column. In this particular example, this is
somewhat redundant with the printed results of the step CLAIR ”.
Return.
10) The button: PlaneView Research , and follow the sub-menus...
In this example, four items of the
menu are relevant “Active columns (variables or
categories)” (principal coordinates
of the active words) , “Active
rows (individuals, observations)” ,(coordinates
of the respondents), “Active columns + Active rows”,
“Active individuals (density)”. The
graphical display of chosen pairs of axes are then
produced.
11) About the button: “BootstrapView”...
...
In fact, The bootstrap is implicitly performed for the analyses VISURESP and VISURECA.
No parameter needs to be specified. The replicate file
"ngus_dir_var_boot.txt" is created
using the so-called “specific bootstrap”.
Using the Button “BootstrapView”... , we will have
to load the file "ngus_dir_var_boot.txt",
and select the words whose confidence ellipses should be drawn.
The bootstrap replicates are in this case obtained after a drawing with replacement of
the respondents (or: rows, individuals, observations).
.
12) Click on “ ClusterView ”
12.1 Choose the axes (1 and 2 to
begin with), and “Continue”.
12.2 Click on “View” . The centroids
of the 30 clusters (produced by the Step PARTI ) appears on the first principal plane.
12.3 Activate the button: “Words” , and ,
pointing with the mouse on a specific cluster, press the right button of the
mouse. A description of the cluster involving the most characteristic words of
the cluster appears. This description is somewhat redundant with that of the
Step MOCAR . But this
display exhibits the pattern of clusters and their relative locations.
12.4 Activate the button “Texts” . Pointing
with the mouse on a specific cluster, and pressing the right button of the
mouse, we can read the most characteristic responses of the selected cluster.
13) Click on “Kohonen map”
Select the type of coordinate.
13.1 Select: “Variables (columns)” : these active variables are the words in this
example.
13.2 Select a (5 x 5) map, and
continue.
13.3 After clicking on two small
check-boxes, press “Draw” on the menu of the large green windows entitled Kohonen map.
13.4 You can change the font size (“Font” ) and
dilate the obtained Kohonen map ( “Dilat.” ) to make
it more legible. The words appearing in the same cell are often associated in
the same responses. This property holds, at a lesser degree, for contiguous
cells.
13.5 Pressing “AxeView” , and selecting one axis allows one to enrich
the display with pieces of information about a specified principal axis : large
positive coordinates in red colour, large negative
coordinates in green, with some transitional hues.
13.6 Go back to the main menu, click
on “Kohonen map” and choose the item “Observations”
13.7 Select a (10 x 10) map, and
redo the operations 13.3 to 13.5 for the observations.
End of example B.01 (VISURESP)