A window
“Choosing among some basic analyses”
appears.
2.2) Click then the button:
“VISUTEX – Visualization of Texts”
This button is located in the paragraph
“Textual and numerical data”.
2.3) Press the button: “Open the text file”,
then search for the directory:
“DtmVic_Examples_A_Start”.
In that directory, open the directory of example A.4, named:
“EX_A04.Text-poems” .
Open the file: “Sonnet_LowerCase.txt” .
A message box indicates then that the corpus contains 20 texts totalizing 321 lines.
2.4) Click:
“Select open questions and separators”
The next window allows for the selection of open questions (not relevant
here) and the selection of separators of words (the produced
default separators suffice in this example).
2.5) Click directly “Vocabulary”.
The next window presents the vocabulary (alphabetic and frequency
orders). We must select a threshold of frequency by selecting a line
in the right hand side memo (frequency order). The line number 113
corresponds to the frequency 4 (It is a very small frequency,
adapted to a very small corpus. This example is just an opportunity of
exploring the sequence of commands, without meaningful linguistic
interpretation).
After selecting that line, click then on: “Confirm”.
2.6) Then click on: “Continue. (Create the parameter file)”
Continuing our visit, we have to “select some options” .
Click “yes”
for the bootstrap validation, and “enter”
to confirm the default number of replicates (25). Click then on "Continue".
2.7) Then click“Create a first parameter file”
A parameter file is displayed in the memo [It can be edited by the advanced users.
It allows for performing again the same analysis later on, if needed].
Important :
The parameter file is saved as “Param_VISUTEX.txt”
in the current directory.
If you wish, you could now exit from DtmVic, and, later on, use the
button of the main menu “Open an existing command file”
(line: “Command file” )
to open directly “Param_VISUTEX.txt” ,
and, in so doing, the user reaches this point of the process. You can then use afterwards the
“Execute” command of the main menu.
2.8) Click on: “Execute”.
This step will run the basic computation steps present in the command file:
archiving data and text, characteristic words and responses,
correspondence analysis of the lexical table.
3) Basic numerical results
Click on: “Basic numerical results” button
The button opens a created (and saved) html
file named “imp.html”
which contains the main results of the previous basic computation
steps. After perusing these numerical results, return to the main
menu. Note that this file is also saved under another name. The name
“imp.html”
is concatenated with the date and time of the analysis (continental
notation): “imp_08.07.09_14.45.html”
means July 8 th , 2009, at 2:45 p.m. That file keeps as an archive
the main numerical results whereas the file “imp.html”
is replaced for each new analysis performed in the same directory.
This file in also saved under a simple text format , under the name
“imp.txt” ,
and likewise with a name including the date and time of execution.
From the step NUMER, we learn for instance that
we have 280 responses (lines), with a total number of words
(occurrences or token) of 2321, involving 830 distinct words (or:
types). Using a frequency threshold of 3 (it means here keeping the
words with frequency over three) the total number of kept words
reduces to 1384, whereas the number of distinct kept words reduces to
114. (Note some – provisional– notational differences:
the minimal selected frequency 4 corresponds to the frequency 3 in
the listing meaning, equivalently, that all the words appearing
more than three times are kept).
Return.
4) Steps VIC (Visualization, Inference, Classification)
4.1) Click the button:“AxeView”
... and follow the sub-menus. In fact, only two tabs are relevant for
this example: “Active variables”
[ = poems] and “observations”
[words]. After clicking on “View” ,
the user obtains the set of principal coordinates along each axis.
Clicking on a column header produce a ranking of all the rows according to the
values of that column.
As mentioned in the previous examples, the use of the AxeView menu is
justified when the data set is large, which is not the case here.
Return.
4.2) Click the button: “PlaneView Research” ,
and follow the sub-menus...
In this example, only one item of the menu is relevant “Active
columns + Rows” . This item concerns both rows and columns of the
contingency table (lexical table). The graphical displays of selected pairs of axes are then
produced. Normally, the active categories (columns of the lexical
table) are printed in red, while the active words (rows) are printed
in blue.
The roles of the different buttons are straightforward, except perhaps the
button: “Rank”, which is useful only in the case of very
intricate displays (which is not the case here) (see comments in the
previous examples).
Return.
4.3) Click the button:
“BootstrapView”
This button opens the
“DtmVic: Bootstrap - Validation - Stability –
Inference” windows.
4.3.1 Click on: “LoadData” .
In this case (partial bootstrap), the replicated coordinates file to be opened
is named “ngus_par_boot1.txt” .
(The set of possible files is given by a background panel).
4.3.2 Click on: “Confidence Areas”
submenu, and choose the pair of axes to be displayed (select axes 1 and 2 to begin with).
4.3.3 The window that appears (enlarge it if necessary) contains the list of identifiers
of active rows and columns (identifiers of columns [Sonnets in this case] are at
the end of the list). Tick some white boxes to select some poems, select also some
words, and press the button “Select” .
4.3.4 Click on: “Confidence Ellipses”
to obtain the graphical display of the chosen column points in red
colour, and of the row points (here: words) in blue. We can see that many sonnets
occupy significant locations (several confidence ellipses do not overlap) whereas
the locations of the words is far from being as accurate.
4.3.5 Close the display window, and, again in the blue window, press
“Convex hulls” . The ellipses are
now replaced by the convex hulls of the replicates for each point.
The convex hulls take into account the peripheral points, whereas the
ellipses are drawn using the density of the clouds of replicates. The
two pieces of information are complementary.
4.4) Click “ClusterView” (in this case,
the clusters are the texts themselves)
4.4.1 Choose the axes (1 and 2 to begin with), and “Continue” .
4.4.2 Click on “View” .
The locations of the 20 categories (texts) appear on the first
principal plane. Thanks to some possible change of signs for the
axes, the display is the same as that provided by the
“PlaneView Research” procedure.
4.4.3 Activate the button “Words” ,
and, pointing with the mouse on a specific category, press the right button of the mouse.
A description of the category involving the most characteristic words
of the category appears. This description
is again redundant with that of the Step MOCAR (see files
“imp.txt” or
“imp.html” using the button
“Basic numerical results” ).
But we can appreciate here the pattern of categories and their relative
locations.
4.4.4 Activate the button “Texts” .
Pointing with the mouse on a specific category, and pressing the
right button of the mouse, we can read the most characteristic lines
(verses) of the selected category. The concept of characteristic line
is not obviously relevant in the case of poetries. It is in fact a
particular case of the concept of “characteristic responses”,
extremely useful in the case of open questions.
More explanation about the corresponding methodology can be found in the
already quoted book: “ Exploring
Textual data ” (L. Lebart, A. Salem, L. Berry; Kluwer Academic
Publisher, 1998).
Return.
4.5) Click
“Kohonen map”
4.5.1 Select: “variables + observations (rows + columns)” :
these active variables are the words and the texts (poems)
in this example.
4.5.2 Select a (5 x 5) map, and “continue” .
4.5.3 Press “draw”
on the menu of the large green windows entitled “Kohonen
map”.
4.5.4 You can change the font size ( “Font” )
and dilate the obtained Kohonen map ( “Dilat.” )
to make it more legible. The words appearing in the same cell are
often associated in the same responses. This property holds, at a
lesser degree, for contiguous cells.
4.5.5 Note that we have obtained a simultaneous Kohonen
representation of rows and columns, owing to the use, as an input
file, of the coordinates from the correspondence analysis of the
lexical table.
4.6) Click
“Seriation”
Seriation techniques as well as Block Seriation techniques are widely used by
practitioners. Seriation is based on simple row and column
permutations of the table under study; they have the great practical
and cognitive advantage of showing the raw data to the user and
therefore allowing the user to forego the use of intricate
interpretation rules. These permutations can display homogenous
blocks of high values or on the contrary, of small or null values.
They can also pinpoint a continuous and progressive evolution of
profiles.
An optimal property of correspondence analysis is the
following: the first axis of a correspondence analysis provides us
with a ranking of the row-points and of the column-points. That
ranking can be used to sort the rows and columns of the analysed data
table. The new obtained data table has then undergone an optimal
seriation. Seriation will be applied here to the lexical table
cross-tabulating the 20 sonnets and the selected words (words
appearing at least 4 times in the corpus).
A new window named “Reordering”
appears.
Click on the button:“Reordering the rows and the column of
a word-text table” .
The reordered table cross-tabulating
the 20 sonnets and the selected words is then displayed. It can be
seen that the first words of the reordered list of words characterize
(sometimes exclusively) the first sonnets in the reordered list of
sonnets. The last words of the same list are either absent or rarely
observed among these sonnets. However, they are frequent among the
last sonnets (right hand side of the table).
That reordered printing of the raw data is a useful tool of
communication with the practitioners, since it can be interpreted
without prior knowledge of data analysis techniques.
General remark.
As you can observe when looking at the content of the example’s
directory, several files have been created and saved [these files are
briefly described in the memo “Help
about files” in the toolbar of
the main menu]. If you need to continue using again the buttons of
the paragraph VIC of the main menu after having closed DtmVic, just
click on the button “Open an existing command file”
from the line “command file” , select and open the
saved command file: “Param_VISUTEX.txt” ,
and close it. It is not necessary to click on: “execute”
again. You can then continue your investigation (axes views, graphs, maps, etc.).
The advanced users can also edit the parameter file
“Param_VISUTEX.txt” ,
(using the memo “Help about parameters”
in the toolbar of the main menu) to perform a new analysis in which the
parameters are given new values. It is advised to give it a new name
(such as “Param_VISUTEX2.txt” ,
for example). All the intermediate files will be replaced (except the
files “imp_date_time.txt”
and “imp_date_time.html” which
are the only saved archives).
End of example A4 (VISUTEX)