HOW TO USE KYST, A VERY FLEXIBLE PROGRAM
TO DO MULTIDIMENSIONAL SCALING AND UNFOLDING
Joseph B. Kruskal Forrest W. Young Judith B. Seery
Bell Laboratories Thurstone Psychometric Laboratory Bell Laboratories
Murray Hill, N.J. University of North Carolina Murray Hill, N.J.
Chapel Hill, N.C.
- 2 -
KYST is an extremely flexible and portable computer program for
multidimensional scaling and unfolding. It represents a merger of M-D-SCAL 5M
and TORSCA 9, including the best features of both, as well as some new features
of interest. The name, pronounced "kissed", is formed from the initials
Kruskal, Young, Shepard, and Torgerson. This instruction manual tells not only
how to use KYST, but includes helpful information about how to get it working
at a computer center, and other useful material.
KYST includes the powerful initial configuration procedure from TORSCA as
well as the very helpful practice of rotating solutions to principal
components. KYST incorporates the great generality of M-D-SCAL 5M, as well as
M-D-SCAL's easy-to-use input procedure.
New features include improved portability, more readable printer plotting,
an easy way to transform the data prior to scaling, an easy way to introduce
weights as functions of the data, and substantial economy of computer memory by
complicated re-use of arrays.
- 3 -
CONTENTS
1. Introduction...............................................................4
2. Input-Preliminary Example .................................................5
3. What KYST Does............................................................ 6
4. Output.....................................................................9
5. Options and Control Phrases................................................9
6. Basic Control Phrases.....................................................11
7. Control Phrases for Data and Weights......................................11
8. Data and Weights Decks....................................................15
10. Control Phrases to Choose Analysis Type..................................21
11. Phrases to Control Termination and Convergence ..........................23
12. Control Phrases for Output ..............................................25
13. Some Ways to Use KYST ...................................................27
14. User Limitations and Computing Times.....................................29
15. Some Precautions: Number of Dimensions, Convergence, and Local Minima ...29
16. The Routines Which Make Up KYST..........................................31
17. Useful Information for Getting KYST Running at a Computer Center ........32
References...................................................................35
Index of Control Phrases.....................................................36
Figures: 1 .................................................5
2 ................................................18
3 ................................................34
- 4 -
1. Introduction
Multidimensional scaling is a statistical technique which is standard in
psychology and is spreading to other fields. Briefly speaking, it constructs a
configuration of points in space from information about distances between the
points. The literature on the subject includes several books and hundreds of
papers. Recent textbooks on psychometrics generally include material on this
topic, and some recent texts on statistics do also. While this instruction
manual assumes familiarity with the basic ideas of scaling, it does include
some brief exposition as a refresher, and to connect the concepts with the
program.
Unfolding can be thought of as a kind of scaling in which there are two
kinds of points (often called "subject" points and "stimulus" points), such
that the information available compares distances radiating from a single
subject point at a time to the various stimulus points. While KYST may be used
to do pure unfolding, difficulties are not unusual. For example, at the most
meaningful solution stress may reach a merely local minimum, and it is not
unusual for the program to reach a non-meaningful minimum. To be recommended
for use with KYST are many procedures intermediate between unfolding and
ordinary scaling, which use two or more sets of data, only one of which is of
the unfolding type. Even where the only direct data available are of the
unfolding type, it is generally possible and desirable to derive secondary data
of another type for supplementary use.
KYST is extremely general because of the many options it contains. In
addition to many types of metric and non-metric scaling and unfolding, many
similar useful models may be solved which do not bear separate names.
KYST is a merger of M-D-SCAL 5M and TORSCA 9, including the best features
of both, as well as several new features. The powerful procedure used by
TORSCA to obtain an initial configuration is included in KYST. While this
initial configuration works very well in the classical scaling situations, its
value in other KYST applications such as unfolding is not always clear.
We believe that it should be quite simple to get KYST running on most
computer systems where FORTRAN IV is available. Both M-D-SCAL and TORSCA are
widely used, and the main obstacle to portability in M-D-SCAL has been removed
from KYST. Useful details are provided in a later section. The considerable
printer plotting done by KYST includes several improvements, such as grid
labelling in round numbers for ease of interpretation.
In order to reduce the need for preliminary programming to manipulate the
data, easy ways to transform the data and to produce weights have been included
in KYST.
Substantial economy of memory space has been achieved by a rather
intricate re-use of arrays. Dynamic allocation of storage, not yet a portable
reality, would have been a more satisfactory solution. Unfortunately, this
means that those who wish to modify the program may find it difficult to change
array sizes safely. However, a diagram of storage use is included for those
who wish to do so.
We welcome feedback from the users of KYST, such as applications of
special interest, and novel but useful ways of using the program. We also
welcome information about any errors which may remain in the program,
especially if accompanied by sufficient information to permit tracking them
down, and still more especially if they have already been tracked down and a
correction prepared.
- 5 -
2. Input-Preliminary Example
Figure 1 shows the complete input to KYST sufficient to perform one simple
example of multidimensional scaling. See Figure 4 at the end of the paper for
another example.
Card
number Each line is one card Description
1 TORSCA Control Card
2 PRE-ITERATIONS=3 Control Card
3 DIMMAX=3,DIMMIN=1 Control Card
4 COORDINATES=ROTATE Control Card
5 CARDS Control Card
6 ITERATIONS=50 Control Card
7 DATA, REGRESSION=DESCENDING Control Card
Data 8 JOHN SMITH'S CONFUSION MATRIX Title Card
deck. 9 5 1 1 Parameter Card
Do not 10 5F5.3 Format Card
separate 11 9.3 0.57 6.1 2.0 1.5 Data
or re- 12 1.0 8.5 5.5 3.3 4.0 Data
arrange 13 2.1 5.3 7.3 3.9 1.1 Data
these 14 5.0 6.2 1.6 8.9 2.8 Data
cards. 15 1.3 4.1 1.0 2.5 8.8 Data
16 COMPUTE Control Card
17 STOP Control Card
Figure 1
Note that the card number and description are for reference purposes only.
They are not punched on the cards.
In this example, the first six cards happen to be control cards. They are
used in this case to give the values of several parameters. Card 1 indicates
that the initial configuration is to be generated using the Young-Torgerson
procedure. Card 2 states that a maximum of 3 iterations are to be used in this
procedure. Card 3 indicates that the scaling is to be done from a "dimension
maximum" of three down to a "dimension minimum" of one; that is, separate
computations will be performed successively, in 3- dimensional space, in
2-dimensional space, and in 1-dimensional space. Card 4 specifies that the
final configuration for each dimensionality should be rotated to principal
components. Card 5 indicates that the configuration which results from each
scaling should be punched out on cards. Card 6 states that no more than 50
iterations should be performed in any single scaling, even if the stress has
not yet reached its minimum value, that is, even if the configuration has not
converged to its final position by then. This is a safety precaution, to
control computing time. Whenever a scaling is terminated, regardless of
whether convergence is complete or not, sufficient output is printed to permit
you to start a new computation from where the last one left off. If desired,
cards may be punched out to make this restart easier.
Cards 7 through 15 in the example are called a data deck. They must not
be separated from each other, nor rearranged internally. However the data
deck, treated as an indivisible unit, and many control
- 6 -
cards may be freely rearranged: the order in which they occur makes no
difference. There are some restrictions however.
Card 7, which is the first card of the data deck, is also a control card,
and indeed contains the signal by which the program knows that the succeeding
cards should not be read as control cards. Any control card which has the word
DATA on it must be immediately followed by the remaining cards of a data deck.
The phrase REGRESSION = DESCENDING indicates that the regression of
distances on data values will be a "monotone descending" regression. This
means two things. First, the scaling will be nonmetric, because monotone
regression is inherently a nonmetric kind of procedure, as opposed to
polynomial regression, which is also possible in KYST. Second, because the
regression is descending, small data values will correspond to large distances
and hence to objects which are very different from one another. Thus
descending regression is appropriate for similarities, and ascending regression
is appropriate for dissimilarities.
The internal arrangement of the data deck is rigidly specified unlike the
arrangement of individual control cards . The second card of the data deck
must be a title card. Its text will be printed and punched out where
appropriate, as a title for the data.
The third card of the data deck indicates the number of rows and columns
of the matrix namely, 5 , the number of values provided for each matrix entry
namely, 1, and the number of such matrices included in the data deck namely, 1.
The fourth card of the data deck must give a format according to the rules
for FORTRAN format statements. This format will be used by the program to read
a single row of the matrix. For computing machines which handle integers
differently from floating-point numbers, a floating-point format must be used.
The succeeding cards of the data deck like cards 11 through 15 must
contain the data matrix itself, punched according to the format you have
provided. Note that each row of the matrix must begin on a new card. However,
several cards may be used for a single row.
A control card with the word COMPUTE causes actual computation to start.
Everything prior to the COMPUTE card constitutes input of data, parameter
values, and option choices. All such input relevant to a given computation
must occur before the corresponding COMPUTE card. This is one exception to the
free rearrangeability of control cards.
It is sometimes convenient to perform several different and possibly
unrelated scaling computations on a single computer run. For this reason, the
program is ready to receive new control cards and new data after it finishes
the computation initiated by a COMPUTE card. After the last computation, there
must be a STOP card to signal the end of the computer run. 3. What KYST Does
Only the briefest description of what KYST does can be given here. For a
fuller description of the
- 7 -
ideas, consult [Kruskal 1964a and 1964b] and [Young 1972] .
KYST places N points in a space of dimension LDIM so as to minimize
STRESS, which measures the "badness-of-fit" between the configuration of points
and the data. It finds the minimizing configuration by starting with some
configuration perhaps found by the classical scaling procedure of Torgerson,
and moving all the points a bit to decrease the stress, then iterating this
procedure over and over again until the stopping criterion is reached.
Typically, from 15 to 50 iterations may be required. Technically speaking,
KYST uses the iterative numerical method of gradients, the method of steepest
descent, with a step-size procedure based primarily on the angles between
successive gradients.
Let the data values be listed in any order and called DATA(1) through
DATA(MM). In John Smith's confusion matrix above, MM = 25. For a square
matrix of size N we have MM = N^2 if the diagonal entries are to be included
or MM = N*(N -1) if not. For a half matrix MM = N*(N +1)/2 or MM = N*(N-1)/2
depending on whether the diagonal entries are included or not. Each entry in
the matrix may be represented by several distinct data values. If each entry
is represented by NREPL1 data values, then MM is NREPL1 times as large as it
would otherwise be. If some of the data values are missing, MM is
appropriately reduced.
For any given configuration of N points (it doesn't make any difference
where the configuration comes from), let the distances between pairs of points
be calculated and listed in DIST(1) to DIST(MM) in corresponding positions to
those in the data list. If one matrix entry has several data values, then the
self same distance will appear in the distance list several times. If some
other matrix entry has no data values (missing data situation, for example),
then the corresponding distance never appears on the distance list.
Corresponding to diagonal entries of the matrix are distances of zero, of
course, and the distances corresponding to (i,j) and (j,i) entries of the
matrix are of course the same automatically.
Now perform a least-squares regression of DIST on DATA. For nonmetric
scaling, monotone ascending regression should be used for dissimilarities, and
monotone descending regression for similarities. For metric scaling, this
regression can be linear or it can be polynomial up to degree 4. Of course,
the regression yields the polynomial coefficients. By writing a very simple
FORTRAN subroutine, still other types of regression can be accommodated.
Let the values of the regression function be listed in DHAT(1) to
DHAT(MM). Let DBAR be the arithmetic average of the DIST values. Then:
sum (over m=1 to MM) (DIST(m) - DHAT(m))^2
STRESS = sqrt[ ------------------------------------------ ]
sum (over m=1 to MM) (DIST(m) - d-sub-0)^2
where
d-sub-0 =
0 for stress formula 1
DBAR for stress formula 2
- 8 -
It is this quantity that is minimized by moving the points of the
configuration around.
The two formulas will often yield very similar configurations. In using
KYST to do multidimensional unfolding, as explained later, the use of Formula 2
is in theory vital. However, Formula 2 sometimes has problems due to local
maxima which cannot occur with Formula 1.
The interpretation of stress values can be greatly affected by the options
used and by various parameters, such as the number of stimuli and the number of
dimensions. While actual experience with real data has been the primary source
of information so far, fortunately several papers have explored this question
by Monte Carlo techniques. We recommend particularly the references by [Klahr
1969], [Sherman 1972], [Spence 1972], [Stenson & Knoll 1969], [Wagenaar &
Padmos 1971], and [Young 1970].
It is possible to transform the data values by a fairly flexible formula,
as a pre-processing step prior to the actual scaling computation. This can be
a very useful preliminary step, both to help the main scaling procedure when
linear or polynomial regression is being used, and to help the TORSCA initial
configuration do a better job.
It is possible to weight the squared terms in STRESS. Let WW(1) to WW(MM)
be a list of positive weights corresponding to the data values. The formula
for stress now becomes:
sum (over m=1 to MM) WW(m) * (DIST(m) - DHAT(m))^2
STRESS = sqrt[ -------------------------------------------------- ]
sum (over m=1 to MM) WW(m) * (DIST(m) - d-sub-0)^2
KYST permits the weights to be given as a reasonably flexible function of
the corresponding data values, or to be read in.
An important element of generality is the ability to "split" the data into
separate sublists and perform separate regressions on each list. This
"splitting" may be used to do unfolding. When the data is split, a separate
regression and a stress is computed for each sublist by the formula above. The
overall stress is taken to be the root-mean-square of the several sublist
stresses. It is possible to split the data in several different ways. Also,
it is possible to use different types of regression for different sublists, a
fact which offers some interesting possibilities.
Normally, KYST uses a distance formula which is appropriate for Euclidean
space. However, it is capable of using a certain other kind of distance which
mathematicians usually call the L-sub-p-metric, and occasionally the Minkowski
r-metric. If points x and y have coordinates x and y , then the Minkowski
r-metric distance between them is given by:
- 9 -
distance-sub-r (x,y) = [ sum | x-sub-i - y-sub-i | ^ r ] ^ (1/r)
For r = 2.0, this is ordinary Euclidean distance. For r = 1.0, this is
the so-called city-block distance, or Manhattan metric. For r = infinity, it
can be shown that this formula yields the maximum of the absolute coordinate
differences, sometimes called the "dominance" metric which is a simple instance
of the important distance mathematicians call the supremum or "sup" metric.
For any value of r between and including 1.0 and infinity, a classical theorem
states that the Minkowski r-metric is a true metric because it satisfies the
triangle inequality. For values of r between but excluding 0 and 1.0, it is
known that the Minkowski r-metric does not satisfy the triangle inequality
although it does if one omits the outside exponent of 1 r, but this case has
nevertheless received much study. It appears that KYST should be capable of
handling values of r in this range also, though we have no actual experience.
4. Output
The program always provides printed output. By use of the control phrase
CARDS, it is possible to obtain punched card output.
The more important input cards are printed out, so it is easy to see what
data and options were used. The final configuration obtained is also printed,
and the corresponding value of the stress. (The stress is the "badness-of-fit
quantity" which is minimized by the program.) For further explanation, see the
references. Several other kinds of printed output are available on an optional
basis. The phrase PRINT=DATA will cause a detailed listing of the input data.
The phrase PRINT=DISTANCES will produce a detailed listing which includes the
data, the distances, and the fitted regression values. The phrase
PRINT=HISTORY will produce a "history" of the calculation which shows the
progress of the iterative numerical calculation.
A printer plot of stress versus dimension is automatically provided, when
appropriate. Two other types of graphical output are optional. How to obtain
a printer plot of distances and fitted distance values versus data, as well as
printer plots of the final configuration is explained in Section 12, Control
Phrases for Output. These plots are all produced on the line printer, and
require no special plotting equipment.
The only punched output provided optionally is a deck of cards called the
"configuration deck". This deck contains the coordinates of the points of the
final configuration, together with some preliminary cards, which permit it to
be used as input to provide a starting configuration in a later scaling
computation. For a description of the configuration deck, see page 20.
5. Options and Control Phrases
There are many options which can be chosen when using KYST. Despite the
large array of possibilities, however, the user is able to specify the correct
options without undue difficulty because of the system of free-format control
phrases which is used. The simple "grammar" of control phrases is described in
the next few paragraphs. The various options, and the control phrases which
invoke them, are described in subsequent sections.
- 10 -
One important principle is used throughout. Whenever several alternative
possibilities are available, one is always selected to be the "standard" or
"default" choice. In the absence of any indication from the user, the program
uses these standard choices. Thus the user is not burdened by having to
specify all possible options. He need specify only those in which he departs
from the standard choices.
Each line below contains one complete, legitimate control phrase:
DATA
DIAGONAL ABSENT
DIMMAX 4
DIMMAX=4
PLOT=CONFIGURATION=NONE
See Figure 4 at the end of the paper for other examples. In general, a
control phrase may appear anywhere on a control card. It may consist of one or
more words or numbers, separated by one or more spaces, equal signs, or commas.
The first item in a phrase must always be a word. Each item either
terminates the phrase, or else determines a special set of allowable items
which can follow it.
The program distinguishes sharply on control cards a
"number-with-decimal-point" from an "integer". The key feature is the presence
or absence of a decimal point. Where one kind of number is expected, the other
is illegal.
Optionally, equal signs and commas ("=" and ",") may be used to improve
the appearance of the control phrases. Thus a card may contain:
DIMMAX=5, DIMMIN=1
instead of the strictly equivalent:
DIMMAX 5 DIMMIN 1
The program which interprets the control phrases treats these two
characters exactly like blanks. Collectively, these three equivalent
characters are called break characters.
A single card may contain several control phrases. However each phrase
must be complete on a single card. One or more break characters must separate
distinct items (that is, words or numbers), and no break character may occur
within a single item. It may be helpful to know that although control words
may be much longer, only the first six characters are used and need be
included.
The dollar sign may be used to enter a default parameter. This device is
especially useful in the DFUNCTION and WFUNCTION options explained later, where
five parameters must be entered. Thus the phrase:
WCONSTANTS = 2.0,$,$,$,-1.0
- 11 -
assigns default values to the second, third, and fourth parameters. It should
be noted if WCONSTANTS is invoked more than once before a single COMPUTE card,
a defaulted parameter will retain the value last assigned to it.
6. Basic Control Phrases
To initiate actual computation, use the control phrase:
COMPUTE
This must be on the very last card which specifies what you want done, as
computation starts without examination of further input cards when the card
containing this phrase has been read.
After the computation initiated by the COMPUTE card is complete, the
program looks for further input. This makes it possible to perform several
different and possibly unrelated computations on one computer run. Before
taking further input, however, the program wipes the "slate clean" by
initializing all parameters and options back to their normal values. A few
very special exceptions to this rule are explained later.
Since the program always looks for further input after completing a
computation, there must be some way of indicating that the entire computer run
is to be ended. For this purpose, use the control phrase:
STOP
This must be on a card by itself.
7. Control Phrases for Data and Weights
This section describes the various ways KYST can interpret and handle the
possibly weighted data. The actual reading in of the data is accomplished via
a data deck, whose format will be described in detail in the next section.
Since the control phrases mentioned below through CUTOFF specify how the data
is to be handled when it is read in, they must appear on or before the control
card containing the phrase:
DATA
which introduces the data deck.
In general, a single data deck may contain several groups of rows of data
though in the simplest situations, a data deck will contain only one such group
. Each group may be thought of as a square matrix, or part of a square matrix.
Each group has the same size and shape as every other in the same data deck.
Normally, the different groups serve as replicates, and contain different data
relating to the same objects. However when the block diagonal option is in
effect, the different groups are not replicates but pertain to different
objects.
- 12 -
Each group in the data deck may have one of five forms:
MATRIX (standard option)
LOWERHALFMATRIX
UPPERHALFMATRIX
LOWERCORNERMATRIX
UPPERCORNERMATRIX
For the two halfmatrix forms, two variations are possible:
DIAGONAL=PRESENT (standard option)
DIAGONAL=ABSENT
MATRIX: In this case the group consists of NPART rows with each row
expected to contain NPART matrix entries. The information presented pertains
to NPART points or objects.
LOWERHALFMATRIX, UPPERHALFMATRIX: In these cases, the group consists of
NPART rows if the main DIAGONAL is PRESENT and NPART - 1 rows if the main
DIAGONAL is ABSENT. Each row is expected to contain from 1 up to NPART or
from 1 up to NPART - 1 matrix entries in a regularly varying way. The
information pertains to NPART points or objects.
LOWERCORNERMATRIX, UPPERCORNERMATRIX: In these cases, each group consists
of NROWS rows, with each row expected to contain NCOLS matrix entries. The
information pertains to NCOLS + NROWS points or objects. As the placement of
the corner in the matrix would suggest, the first several points in the
lowercorner case and the last several points in the uppercorner case correspond
to the columns. Figure 2 on pages 17-18 , in which a data value corresponding
to the (i,j) matrix entry is denoted by ij, should help make these forms clear.
The main use for the halfmatrix forms occurs when measured dissimilarity
or similarity between objects i and j is experimentally symmetric, so there is
no conceptual distinction between the (i,j) entry and the (j,i) entry. One
major use for the cornermatrix forms is in unfolding. Because the "splitting"
option see below is restricted to rows and not available for columns, the
columns must correspond to stimuli or "real" points and the rows to subjects
(or "ideal" points).
When a data deck contains more than one group, normally the different
groups pertain to the same objects, and contain replicated measurements of the
same matrix entries. Thus if a single group pertains to NPART objects, then
normally the whole data deck pertains only to N = NPART objects. However, when
the blockdiagonal option is used, each group pertains to a new set of objects,
and the several groups should be thought of as blocks down the diagonal of a
single larger data matrix. If each group pertains to NPART objects, and there
are NREPL3 groups, then the whole data deck pertains to N = NREPL3 * NPART
objects. If each group is only part of a square matrix, it is the square
matrices which are strung down the diagonal, with each group occupying its
appropriate position in its square matrix. The shape of the groups is
determined by using the above control phrases to describe the data. The block
diagonal option is controlled by the phrases:
BLOCKDIAGONAL=NO (standard option)
BLOCKDIAGONAL=YES
- 13 -
Two possible configurations using the BLOCKDIAGONAL=YES option are the
following:
BLOCKDIAGONAL=YES BLOCKDIAGONAL=YES
MATRIX LOWERCORNERMATRIX
As always, the standard option applies if no specific indication is given.
If individual data values are missing, this fact may be indicated by using
values which are algebraically less than some specific cutoff value, and
indicating this value by the phrase:
CUTOFF=number-with-decimal-point (standard value is -1.23*10E20) .
Values which are < CUTOFF value will be discarded. Since this discarding
takes place while the data is being read in, the CUTOFF must be provided on or
before the DATA card. The standard value is so large a negative number that it
is unlikely to affect anyone's data.
In addition to all the options mentioned above, it is possible to have not
one but several data values for each matrix entry of each group. This is
called internal replication. While this feature is handled by the parameter
card of the data deck rather than by a control phrase, it is mentioned here
merely for clarity.
Note again that if used, all the phrases above must appear before the data
deck. Also note that if more than one data deck is provided, only the first
one is used by the TORSCA initial configuration routine, though all are of
course used in the main scaling routine.
Following the first computation initiated by a COMPUTE card , it is
possible to use the data from the preceding computation without another copy of
the data deck. It is simply necessary, somewhere before the relevant COMPUTE
card, to use the phrase:
SAVE = DATA
The effect of this phrase is simply to retain for this computation the
data from the last computation. It is even possible to use SAVE=DATA and
follow it by another data deck. This will have the same effect as using more
than one data deck in one computation, which is explained elsewhere.
The use of SAVE=DATA requires some care. All items which the program
stores internally with the data are frozen by use of this option, and it may
not be obvious just which items these are. Everything which must be specified
before the data deck is stored with the data, and will be retained by use of a
SAVE=DATA card, regardless of other option cards accompanying the SAVE=DATA
card. Such frozen options include regression-type and shape of matrix.
Actually, data is saved without use of a SAVE=DATA card, if no new data is
provided. To add a new data deck without losing the old data from before the
last COMPUTE card, however, it is necessary to SAVE=DATA before the new data
deck.
When performing metric scaling, a preliminary transformation of the data
may be quite important. Also, regardless whether the scaling is metric or not,
such a preliminary transformation may be helpful to the classical Torgerson
scaling which is the first step in the TORSCA initial configuration option, by
making the data values more like Euclidean distances. Any transformation
desired can of course be
- 14 -
accomplished by a simple computer program specially written for the purpose,
and the transformed data then entered into KYST; also a still simpler function
subroutine DTRAN may be used with KYST as described below. However, to
encourage the use of transformations we have included the following
transformations directly within KYST:
DATA = s + t * (a + b*DATA) ^ p
In words, this transformation consists of a first stage linear
transformation using a and b, a second power stage transformation using p,
followed by a third stage linear transformation using s and t. To invoke this
transformation, place the following two cards immediately after the data deck
(except that weight generating cards to be described below are permitted to
intervene):
DCONSTANTS = a,b,p,s,t
DFUNCTION
where each of the five parameters is a number-with-decimal-point
(standard values a=0.0, b=1.0, p=1.0, s=0.0, t=1.0);
While the DCONSTANTS = a,b,p,s,t phrase must precede the DFUNCTION phrase
to be effective, it could occur anywhere before DFUNCTION and need not be
placed as shown. Recall that a dollar sign may be used instead of a number to
indicate the default value. KYST uses a function subroutine called DTRAN to
accomplish the transformation. If the user provides his own FORTRAN function
DTRAN(X), DFUNCTION will invoke its use.
Weights associated with the data values are sometimes useful, to reflect
the varying measurement errors or for other reasons. How they are used by KYST
is indicated in an earlier section. Weights may either be computed as a
function of the associated data values, or read in from a deck. If no weights
are provided by either method, the program supplies the default weight of 1.0
for every data value. The weights are effective both during the initial
configuration calculation, and during the main scaling computation. It is
dangerous to use weights of 0.0, since they may lead to division by zero in the
monotone regression fitting algorithm FITM. Negative weights have not been
considered. The effect of a zero weight can be obtained by causing the
corresponding data value to be missing see CUTOFF above, or approximated by
using a very small positive weight.
It is possible, directly within KYST, to compute the weights as any
function of the data of this type:
WEIGHT = s+t*(a + b*DATA) ^ p
In words, this transformation consists of three stages, a first linear
stage using a and b, a second power stage using p, and a third linear stage
using s and t. To invoke this transformation, place the
- 15 -
following two cards immediately following the data deck or immediately
following the DFUNCTION card:
WCONSTANTS = a,b,p,s,t
WFUNCTION
where each of the five parameters is a number-with-decimal point
(standard values a=0.0, b=1.0, p=1.0, s=0.0, t=1.0);
If both WFUNCTION and DFUNCTION are used, it is vital to place them on
separate cards, and important to remember that the corresponding operations are
invoked in the same order in which the cards appear. Thus if DFUNCTION appears
first, the weights will be computed from the transformed data values. While
the WCONSTANTS = a,b,p,s,t phrase must precede the WFUNCTION phrase to be
effective, it could occur anywhere before WFUNCTION and need not be placed as
shown. Recall the possible use of for default values. KYST uses a function
subroutine WTRAN(X) to accomplish the transformation. The user may supply his
own version, and invoke its use.
It is also possible to read in weights from a deck. To introduce the deck
the control phrase:
WEIGHTS
is used. When a weights deck is used, it must immediately follow the data
deck, with no intervening cards except possibly for DCONSTANTS and DFUNCTION .
Generally speaking, the arrangement and format of the weights deck must be
exactly the same as that of the data deck, and one weight must appear for every
data value, even those which are artificial and indicate missing data. For
details, see the section on deck formats.
8. Data and Weights Decks
There are three types of input decks: data decks, weights decks, and
initial configuration decks. Only the data deck is necessary.
Discussion of the initial configuration deck will be deferred to the next
section.
The data deck consists of both non-control cards and one control card
which introduces it. Except for the initial card, the cards of the data deck
are not control cards. Similarly, each of the other kinds of decks consists of
one initial card which introduces it, and subsequent cards which are not
control cards.
The first card of a data deck is a control card containing DATA.
Naturally other control phrases may appear on this card also. It is often
convenient to include on this card phrases such as LOWERCORNERMATRIX,
UPPERCORNERMATRIX, and CUTOFF = 0.5 which are likely to be permanently attached
to this data. The second card is a title card, which acts as a label for the
data. The title card may contain any text at all. The third card of the data
deck is a parameter card. It should contain either three or four integer
parameters, placed so that they end in columns 3, 6, 9, and l2.
- 16 -
Before describing the arrangement of parameters, let us recall that in
general the data deck consists of several "groups" (see section 7) of rows.
Each group has a certain number of rows and columns the same for every group in
a single data deck . Unless one of the cornermatrix options is in effect, the
number of rows is the same as the number of columns. In addition to the
replication possible by using several groups, internal replication is possible:
each matrix entry from a single group may be represented by several adjacent
data values. The number of them is constant throughout the data deck, and is
called the number of replications.
Unless one of the cornermatrix options is in effect, the parameter card
should contain three parameters, as follows:
1) the number of rows and columns in each group (NPART);
2) the number of replications (NREPL1);
3) the number of groups (NREPL3) .
If one of the cornermatrix options is in effect, the parameter card should
contain four parameters, as follows:
1) the number of rows in each group (NROWS);
2) the number of columns in each group (NCOLS);
3) the number of replications (NREPL1);
4) the number of groups (NREPL3).
The fourth card of the data deck should contain a format, constructed
according to the rules for FORTRAN format statements. For computers which
distinguish between floating-point numbers and integers internally (and many
do) , it is necessary to use format conversion characters like E and F which
correspond to floating-point numbers. The format should be appropriate for a
single row of a single group of the data deck, since KYST uses a single FORTRAN
READ statement with this format to read a single row. If you need information
about FORTRAN formats or READ statements, consult a FORTRAN manual or a
programmer.
Following the format card are the data values themselves. A new row
should always start on a new card. Aside from this, these cards may be punched
in any way you desire, consonant with the format you have provided. Where a
row is too long to fit on a single card, it may be placed on as many cards as
necessary.
More than one data deck may be used. When this is done, the different
decks are related to one another more or less as the different groups within a
single data deck. The different data decks pertain to the same set of objects
as one another. However different data decks may use different types of
regression, and may have different shapes, different numbers of replications,
and different formats. Only the first data deck is used by the TORSCA initial
configuration routine, though all are of course used during the main scaling.
- 17 -
The first card of a weights deck is a control card containing WEIGHTS.
The subsequent cards contain the weights themselves, which should not be
negative nor zero. The weights decks is required immediately to follow the
corresponding data deck without intervening cards except possibly DCONSTANTS
and DFUNCTION cards . KYST reads the weights in the same way and with the same
format it used for reading the preceding data. Thus the cards containing the
weights should be identical in arrangement and format to the cards containing
the data values. Where a data value is indicated as being missing because it
falls below the value of CUTOFF, a corresponding value must nevertheless appear
in the weights deck, even though it will not be used. A blank value can
generally be used in this case. Aside from weights corresponding to missing
data values, weight values of zero are dangerous, as explained elsewhere.
When more than one data deck is used to supply data for a single
computation, every data deck should be given the same weights treatment.
Either no data deck should be weighted, or every data deck should be
accompanied by a weights deck, or every data deck should use computed weights.
The weights decks, WCONSTANTS and/or WFUNCTION cards should be interlaced with
the data decks.
Examples of Data Decks
Entry for (i,j) position in matrix has value ij. Actual data should
consist of floating-point numbers.
DATA=MATRIX
title card
4 1 1
format card
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
DIAGONAL=PRESENT
DATA,LOWERHALFMATRIX
title card
4 1 1
format card
11
21 22
31 32 33
41 42 43 44
- 18 -
DIAGONAL=ABSENT
DATA,LOWERHALFMATRIX
title card
4 1 1
format card
21
31 32
41 42 43
DIAGONAL=PRESENT
DATA,UPPERHALFMATRIX
title card
4 1 1
format card
11 12 13 14
22 23 24
33 34
44
DIAGONAL=ABSENT
DATA,UPPERHALFMATRIX
title card
4 1 1
format card
12 13 14
23 24
34
DATA,LOWERCORNERMATRIX
title card
3 2 1 1
format card
31 32
41 42
51 52
DATA,MATRIX
title card
2 3 2
format card
11 11 11 12 12 12
21 21 21 22 22 22
11 11 11 12 12 12
21 21 21 22 22 22
BLOCKDIAGONAL=YES
DATA,MATRIX
title card
2 1 3
format card
11 12
21 22
33 34
43 44
55 56
65 66
Figure 2
- 19 -
9. Control Phrases and Deck for Initial Configuration
There are five ways in which the initial configuration may be produced:
1) the TORSCA initial configuration procedure may be used;
2) a random initial configuration may be used;
3) the so-called "ARBITRARY" initial configuration may be used;
4) a configuration may be read in from a deck;
5) the solution from a preceding computation may be saved for use as the
starting configuration.
Note that when a single COMPUTE card is used to initiate scaling in
several different dimensionalities, the choice above applies only to the first
scaling, which is in the highest dimensionality. For each subsequent scaling
the solution from the last previous scaling in a higher dimensionality is used
with the last one or more coordinates dropped off. If rotation to principal
coordinates is being used under the standard option COORDINATES = ROTATE , this
procedure usually provides a very good starting configuration.
We list here the five corresponding control phrases:
TORSCA (standard option)
RANDOM=integer (standard value for integer is 0)
ARBITRARY
CONFIGURATION
SAVE=CONFIGURATION
Further explanation follows.
Under the control phrase TORSCA, KYST first uses the classical Torgerson
scaling technique to derive a configuration. If the number of points N < 60,
the program then proceeds to use the quasi- nonmetric Young method of improving
this configuration. The control phrase:
PRE-ITERATIONS=integer (standard value is 1)
determines the maximum number of such steps. If N > 60, however, memory space
does not permit the Young procedure to be used. If more than one data deck is
used, as described elsewhere, the TORSCA initial configuration calculation will
use only the data from the first deck. In some cases, the initial
configuration obtained from the TORSCA procedure is already quite good.
If RANDOM=integer is used, the program generates its own starting
configuration by picking pseudo- random points from an approximately normal
multivariate distribution. Actually, each coordinate is obtained by applying
the logistic transformation to a uniform random variable, obtained from a
moderate-quality portable generator. This feature is especially valuable when
you wish to generate many solutions starting from different configurations, in
order to guard against local minimum solutions. Use
- 20 -
of RANDOM=integer in each of a successive set of computations during a single
computer run will produce new and independent starting configurations.
However, the pseudo-random number generator naturally produces the same
sequence of numbers in each computer run. In order to permit different
starting configurations in different runs, the program obtains and discards
several random numbers before generating the initial configuration. The
integer which follows RANDOM specifies how many random numbers should be
discarded. By altering this integer from one run to the next, you can make the
program use different starting configurations on different runs.
The control phrases ARBITRARY is included primarily for historical
compatibility. If it is used, the starting configuration prior to being
standardized is given by:
r-th coordinate of the i-th point =
0.01*i if r congruent to i modulo dimensionality,
0 otherwise.
The control phrase:
CONFIGURATION
introduces a deck containing a configuration. The subsequent cards of this
deck are not control cards, and must have the following precise arrangement
which parallels that of the data deck, described in the previous section. The
second card of the configuration deck, which immediately follows the
CONFIGURATION control card, is a title card, which serves as a label for the
configuration. It may contain any suitable text, and is printed out where
appropriate. The third card is a parameter card. It should contain two
integer parameters, placed so that they end in columns 3 and 6. The first
parameter indicates the number of points in the configuration. The second
parameter indicates the number of dimensions, that is the number of coordinates
in each point. The fourth card is a format card, containing a FORTRAN format
statement suitable for reading all the coordinates of a single point. The
subsequent cards of the configuration deck contain the configuration itself.
These must be arranged point by point, with all coordinates for a single point
together. The coordinates for each point must begin on a new card, though as
many cards as are needed may be used for each point.
Following the first entire computation that is, all the scalings initiated
by a single COMPUTE card, use of the phrase:
SAVE=CONFIGURATION
will retain as the starting configuration for the next computation the last
solution from the preceding computation. This is sensible only if the
dimensionality of the last solution is at least as great as the dimensionality
of the new computation.
Using either of the control phrases CONFIGURATION or SAVE=CONFIGURATION it
is possible to provide a configuration which does not have enough points to
serve as an entire starting configuration. If no points are provided, one may
consider that an initial configuration with zero points has been provided. By
default enough additional points are provided arbitrarily. The phrase
RANDOM=integer can be used to provide the remaining points randomly to complete
the initial configuration. Thus it is possible
- 21 -
to use the phrase SAVE=CONFIGURATION or a configuration deck, followed by
either ARBITRARY or RANDOM=integer.
10. Control Phrases to Choose Analysis Type
A single COMPUTE card can initiate several computations, all using the
same data and same options, in several spaces of different dimension. The
first computation is done in the maximum dimension DIMMAX. Then the dimension
drops by the dimension difference DIMDIF, and another computation is done.
This is repeated until finally the computation is done in the minimum dimension
DIMMIN, which completes the computation caused by the present COMPUTE card.
The phrases:
DIMMAX=integer (standard value is 2);
DIMMIN=integer (standard value is 2);
DIMDIF=integer (standard value is 1);
control the dimensions actually used.
The kind of distance function used by KYST is controlled by the phrase:
R=number-with-decimal-point (standard value is 2.0).
For R=2.0, this yields the ordinary Euclidean distance, while for R=1.0,
this yields the city-block metric. For these two special values, the necessary
powers and roots are computed in a relatively rapid way, but for other values
the general purpose power operations are used, which behind the scenes require
a logarithm and an exponential to be calculated for each power operation. Thus
for general values of R, the computation will be perceptibly slower.
Two different stress formulas are available, as indicated earlier. The
phrases:
SFORMl (standard option)
SFORM2
choose the one to be used. As explained in detail above, formula 2 uses a
variance-like expression involving DBAR in the denominator, while formula 1
uses 0 in place of DBAR.
As explained in detail in [Kruskal l964a and l964b], there are two ways of
treating ties among the data values when performing a monotone regression.
Consider the fitted regression values corresponding to a group of equal data
values. Either no restriction is placed on the size of these fitted values
with respect to one another the "primary" approach , or these fitted values may
be required to be equal the "secondary" approach. The following control
phrases:
PRIMARY (standard option)
SECONDARY
- 22 -
control this option. Where there are few ties, it hardly makes any difference
which is used. Where there are a great many ties, as when the data values
represent coarse category judgements, it does make a difference, and the
context must be considered in making the choice.
The regression type may be either monotone or polynomial. The former
makes the scaling nonmetric, the latter metric. The monotone regression may be
either ascending suitable for dissimilarities or descending suitable for
similarities. The polynomial regression may be of any degree up to quartic
degree 4 , with or without constant term. The coefficient values are a
by-product of the regression. These phrases:
REGRESSION=ASCENDING (standard option)
REGRESSION=DESCENDING
REGRESSION=POLYNOMAL=integer degree of polynomial
REGRESSION=CONSTANT (standard option)
REGRESSION=NOCONSTANT
REGRESSION=MULTIVARIATE=integer number of variates
which control regression type, must be given before the data deck. The integer
in REGRESSION=POLYNOMIAL=integer gives the degree of the polynomial, which
should be no more than four. It is possible to include or exclude the constant
term when fitting polynomials by use of REGRESSION=CONSTANT and
REGRESSION=NOCONSTANT. The last phrase, REGRESSION=MULTIVARIATE= integer, is
needed only in special cases, as indicated below.
Regression type must be indicated before the data deck or on the card
which introduces the data deck , because it is stored away with the data. When
more than one data deck is used, a different type of regression may be chosen
for each data deck, if desired. The former restriction helps permit the latter
freedom. There are situations where it may be helpful to use monotone and
straight-line regression together thus performing something intermediate
between nonmetric and metric scaling .
Some useful types of regression may be quite easily substituted for
polynomial regression, by writing a short function subroutine to replace the
very simple program REGR. If REGR(X,K) provides the function value f-sub-k (x),
then KYST will perform the standard multivariate linear regression procedure
and pick out the best linear combination of the functions f-sub-k. The phrase
REGRESSION=MULTIVARIATE=integer should be used to indicate how many functions
are included. Because a very simple method is used to solve the simultaneous
linear equations, it is desirable that the functions f-sub-k should not be too
terribly far from orthogonal. Not more than five functions f-sub-k are
permitted.
The entire list of data values may be split into sublists, with a separate
regression performed on each sublist. There are several ways in which the data
can be split. Each row of every data deck can be a sublist, each group of rows
(see section 8) can be a sublist, each data deck can be a sublist. Where there
is more than one data decks, different decks may be split differently. These
phrases:
SPLIT=BYROWS
SPLIT=BYGROUPS
SPLIT=BYDECKS (standard option)
SPLIT=NOMORE
- 23 -
which control the splitting, must occur before the data decks to which they
pertain, because they affect how the data is stored away. SPLIT=BYROWS,
SPLIT=BYGROUPS, or SPLIT=BYDECKS, make each row, each group of rows, or each
data deck a separate sublist respectively. SPLIT=NOMORE is relevant only when
several data decks are used. It causes all subsequent data decks to be joined
into a single sublist, until another SPLIT card. If it is given before the
first data deck and no more splitting instructions occur, there will be only
one list. There is an additional feature which is useful when there are
several data decks, and you wish to join them into two or three sublists. Even
though splitting has been totally suspended by an earlier use of SPLIT=NOMORE,
a subsequent use of SPLIT=NOMORE will cause a split where it occurs, though
splitting will continue suspended. KYST can handle up to l00 sublists.
The phrase:
FIX=integer (standard value is 0),
fixes the first several points of the configuration, so that they do not move
during the main scaling computation. In other words, the program treats these
points as given, and not subject to adjustment. This option is sensible only
if the points of the initial configuration which have been fixed are sensible.
Thus normally this option would be used only if points are read in from a
configuration deck, or if the initial configuration has been created by the
TORSCA initial configuration routine. Note that the FIX option has no effect
on how the TORSCA initial configuration routine operates, but only affects the
main scaling procedure. The integer indicates how many points are fixed.
If the integer is 0, then the phrase has no effect. Only the initial
points of the configuration can be fixed, that is, the points numbered from 1
up to the value of the integer. The means used internally to accomplish this
fixing is to set to zero all the coordinates of the gradient vector which
correspond to fixed points, just before the program starts to make use of the
gradient vector. If the user wants these values to appear without change as
the coordinate values in the final configuration, he should use the phrase
COORDINATES=AS-IS. Coordinate options are explained fully later. Otherwise
the fixed points, along with the rest of the configuration will be subject to
the usual standardization and rotation, and the resultant coordinate values
will be different than those fixed.
Note that the number of points in the configuration deck is independent of
the number of points fixed by this new phrase, and that both of these are
independent of the number N of stimuli in the data deck. Thus it would be
possible and perhaps even sensible to use a data deck which pertains to 30
stimuli that is, points, read in a starting configuration which pertains to
only 10 stimuli, and FIX only the first 7 of these points.
11. Phrases to Control Termination and Convergence
Normally computation runs through enough iterations, stopping when the
program judges that it has reached the minimum stress value. To control
computation time in case convergence is unduly slow, however, the program will
terminate computation after 50 iterations and yield normal output even if a
minimum has not yet been reached. The reason for termination is always
indicated. To change the maximum number of iterations, use:
- 24 -
ITERATIONS=integer (standard value is 50) .
More iterations, say l00, may be useful to assure complete convergence,
though the further change after 50 iterations if often so small as to be
negligible. Fewer iterations say 25 or less may be useful where a great many
sets of data are to be analyzed and computer time is to be conserved.
Of course, the program has no way of distinguishing between a local
minimum and a global minimum. That is, any minimum it reaches may be merely
the smallest value in some local region a local minimum rather than the
smallest value everywhere a global minimum . This difficulty is shared by all
procedures for minimizing functions of many variables except when very special
conditions hold, such as convexity. For the more usual ways of using KYST, the
TORSCA initial configuration option should reduce the chance of reaching a
merely local minimum. For further information about this question, consult the
growing literature, including papers such as [Spence l972] . If you suspect
that a solution may not be the global minimum, one standard technique is to
repeat the scaling several times, starting from different initial
configurations as explained in [Kruskal l964b]. The use of RANDOM=integer and
SAVE=DATA makes this easy to do. As explained in the reference, if the same
solution up to such immaterial changes as rotation and reflection occurs
several times and has the least stress by a considerable amount of all
solutions found, while only a few other solutions occur, which are all worse
and all quite different from one another, then we may have considerable
practical confidence in having found the true minimum solution.
At a minimum configuration whether global or merely local the gradient
vector is 0 that is, all the partial derivatives of the stress are 0. One test
the program uses for determining whether it has reached a local minimum is
whether the length of the gradient is small enough. The phrase:
SFGRMN=number-with-decimal-point (standard value is 0.0).
provides a criterion value. SFGRMN stands for scale factor of the gradient
minimum. When the length more properly, the scale factor of the gradient is
less than or equal to this criterion, the program decides that a local minimum
has been reached. Due to the fact that computer arithmetic is approximate, it
is very unusual for the gradient to become precisely 0. Thus with the standard
value in effect, this test is seldom satisfied.
In practice, primary reliance is placed on how fast the stress is
decreasing. If the ratio between present and previous stress is close enough
to 1.0, and if simultaneously the weighted average of past values of this ratio
is close enough to 1.0, then the program decides that a local minimum has been
reached. The control phrase:
SRATST= number-with-decimal-point (standard value is 0.999)
provides the criterion value. SRATST means stress ratio stop. If both the
stress ratio and the weighted average Stress ratio are between this value and
l.0, then the criterion is met. A more rigorous criterion is achieved by using
a value like 0.9999 which is closer to l.0, while a more generous criterion
comes from a value like 0.99 further from 1.0. If 1.0 itself is used, the
criterion becomes virtually unachievable.
Even if a minimum value of stress has not yet been achieved, it is
sometimes desirable to stop computing when the stress has become sufficiently
small. The control phrase:
- 25 -
STRMIN=number-with-decimal-point (standard value is 0.01)
provides the criterion value. STRMIN means stress minimum. Usually when
stress gets smaller than this (that is, below 0.01), only negligible further
movement of the configuration occurs in achieving a minimum. For nonmetric
scaling, if the stress becomes small enough, in practice it turns out that zero
stress is always achievable. However, it may take many iterations of almost
infinitesimal movement to achieve it.
During computation, the cosine of the angle between successive gradients
plays an important role in several ways. The exponentially weighted average of
past cosine values is kept up to date, and a similar average of their absolute
values. The control phrases:
COSAVW=number-with-decimal-point (standard value is 0.66);
ACSAVW=number-with-decimal-point (standard value is 0.66);
may be used to alter the weights used in forming these averages.
Whenever computation is terminated, the reason is always indicated by a
remark such as MINIMUM WAS ACHIEVED, MAXIMUM NUMBER OF ITERATIONS WERE USED,
SATISFACTORY STRESS WAS REACHED, or ZERO STRESS WAS REACHED
12. Control Phrases for Output
Normally no punched card output of any kind is produced. Sometimes
however it is useful to get the final configuration out on a configuration
deck, either for use as a starting configuration in further scaling, or for
other analysis. These two phrases control the punchout option:
NOCARDS (standard option)
CARDS
There are three kinds of optional printed output in addition to the basic
printed output which is always supplied. The phrases:
PRINT=NODATA (standard option)
PRINT=DATA
control whether the data and weights are printed out at the beginning of the
computation. This printout is sometimes useful for reference, or if KYST seems
to misunderstand your input data. The phrases:
PRINT=HISTORY (standard option)
PRINT=NOHISTORY
control whether the program prints certain information on each operation. This
"historical" record of the computation is useful in seeing how most of the
stress reduction takes place at first, and also permits someone experienced in
the art to judge such things as how close to the minimum configuration the
- 26 -
program has actually gotten. The phrases
PRINT=NODISTANCES (standard option)
PRINT=DISTANCES
control whether a rather voluminous terminal printout takes place. This
printout shows the data values in order of size in each sublist, if splitting
has been used , and with each data value shows the indices of the two points it
pertains to, the distance between the two points, and the corresponding
regression value.
There are also three kinds of graphical output produced on the printer.
If the data is analyzed in more than one dimension, a plot of stress versus
dimension is produced. The two other types of plots are optional, governed by
control phrases. The phrases
PLOT=SCATTER=NONE
PLOT=SCATTER=SOME (standard option)
PLOT=SCATTER=ALL
control whether a scatter plot of the distances DIST and the fitted distances
DHAT versus the data DATA is generated. If the option NONE is used, no scatter
plotting is done. If the data is split into more than one sublist, then will
give a separate plot for each sublist, as well as a plot for the entire data.
SOME is similar, but plots only the first five sublists. If the data is not
split, SOME and ALL both have the same effect: a single plot of distances and
regression values versus data values.
The phrases
PLOT=CONFIGURATION=NONE
PLOT=CONFIGURATION=SOME
PLOT=CONFIGURATION=ALL (standard option)
control plotting of the final configuration. The use of ALL produces plots of
all LDIM*(LDIM -1) 2 pairs of dimensions of the configuration, where LDIM is
the dimensionality of the configuration. The option SOME is similar except
that each dimension is plotted against only one other dimension. However, if
LDIM is odd, dimension l is plotted twice, once with dimension 2, and once
with dimension 3. If NONE is used, no configuration plotting is done.
Several phrases pertain to the form of the configuration:
COORDINATES=AS-IS
COORDINATES=STANDARDIZE
COORDINATES=ROTATE (standard option)
Under the STANDARDIZE option, the configuration has its centroid set to
the origin and the mean square distance of the points from the centroid set to
1 after every iteration. The option ROTATE, beside causing the above
standardization on each iteration, also causes the final configuration to be
rotated so that its principal components lie along the coordinate axes. Where
the scaling is to be done in several different dimensionalities, this has the
desirable effect of causing the least important coordinates to be dropped when
going to the next lower dimensionality. The option AS-IS suppresses both the
standardization and the rotation, and may be particularly useful when the
initial configuration is read in from a configuration deck, particularly if the
FIX option is used.
- 27 -
13. Some Ways to Use KYST
For nonmetric scaling of the most familiar type, the program may be used
as in the preliminary example, with the following simple changes used where
appropriate. The phrases TORSCA and COORDINATES=ROTATE are standard options
and may be omitted. The PRE-ITERATIONS phrase will generally be omitted, since
the standard option of l pre-iteration usually suffices. If the data are
dissimilarities, the phrase REGRESSION=ASCENDING should be substituted for the
corresponding phrase in the example. If the data form a lower or
upperhalfmatrix, the phrases LOWERHALFMATRIX and UPPERHALFMATRIX should be
used, with DIAGONAL=ABSENT if the diagonal values are missing. For metric
scaling, the control phrase REGRESSION=POLYNOMIAL=degree should be included
before the data deck accompanied by REGRESSION=NOCONSTANT if desired. Also,
consider the possibility of transforming the data preliminary to the scaling,
by the use of DCONSTANTS=a,b,p,s,t and DFUNCTION. If the data are transformed
in connection with nonmetric scaling to improve the operation of the TORSCA
initial configuration routine , note that the choice of the regression as
ASCENDING or DESCENDING should be based on the form of the data after
transformation.
To do unfolding, the rankings provided by each subject or judge should
form one row of the data matrix, and the stimuli or objects judged should
correspond to the columns. It is necessary to use either LOWERCORNERMATRIX or
UPPERCORNERMATRIX before the data.
If larger distances from the subjects ideal point should match larger data
values, the phrase REGRESSION=ASCENDING should be used, while if a reverse
relationship between distances and data values is appropriate, the phrase
REGRESSION=DESCENDING should be used. It is also possible to use KYST for
nonmetric unfolding in which the subjects have negative ideal points instead of
positive ideal points, that is, most disliked central points instead of most
liked central points. In fact, the program has no knowledge of which kind of
ideal point you are postulating, nor does it care. The only important fact for
the program is whether distances from the ideal point, regardless of type,
should have an ascending or descending relationship to the data.
When an unfolding is to be done nonmetrically, it is in theory vital to
SPLIT=BYROWS and use SFORM2, as otherwise a meaningless zero-stress solution
may be possible. This solution is one example of what is referred to as
degeneracy.
In practice, however, the situation is not so clear, particularly if a
good initial configuration is used. If the unfolding is metric, most often it
will be preferable to SPLIT=BYROWS, but where rankings are reasonably
comparable among subjects and the data is scanty, it can be sensible to include
the rankings of all subjects in a single regression.
Suppose several subjects have provided direct judged dissimilarities of a
single set of objects. Suppose either a) that the values provided by each
subject form a separate data deck, or alternatively b) that there is only one
data deck and that the values provided by each subject form a group of rows in
the deck. There are at least four meaningful levels of analysis possible,
based on two independent variables:
- 28 -
C1: A single configuration is shared by all subjects;
CS: A separate configuration is formed for each subject;
R1: A single regression is shared for all subjects;
RS: A separate regression is done for each subject.
Case C1R1 does not allow for any differences between subjects at all, and
treats the different subjects merely as replications. Each of C1RS and CSR1
allows for differences between subjects; though in quite distinct ways. CSRS
is conceptually and actually equivalent to performing an entirely separate
scaling for each subject.
R1 RS
a SPLIT=NOMORE a SPLIT=BYDECKS
C1 b SPLIT=BYDECKS b SPLIT=BYGROUPS
BLOCKDIAGONAL=NO BLOCKDIAGONAL=NO
a Not possible a Not possible
CS b SPLIT=BYDECKS b SPLIT=BYGROUPS
BLOCKDIAGONAL=YES BLOCKDIAGONAL=YES
This table shows how to accomplish the various possibilities. Note that
these four cases correspond to McGee's four cells [McGee, 1968]. However he
includes a kind of distance between the separate configurations in his
badness-of-fit measure, which gives his cases corresponding to our CS cases a
somewhat different meaning than ours.
In an unfolding procedure using only linear (or polynomial) regression,
KYST places no restriction on the coefficients of the regression function.
Thus over the region containing the data values the regression functions can be
ascending, descending, constant, or (in the case of polynomials of degree >= 2)
ascending in part and descending in part. One way to handle situations where
this freedom is undesirable is discussed below. One useful possibility that
comes with this freedom is that a subject's "ideal" point may be a negative
ideal, his most disliked possible stimulus, just as easily as a positive ideal,
and the recovered shape of the regression line indicates which it is. One
unfortunate concomitant is that where a subject's ideal point (positive or
negative) truly belongs outside or on the edge of the region containing the
real stimulus points, it is easy for the program to misplace it on the
diametrically opposite position and make it of opposite type.
Sometimes it is desirable to do a scaling or an unfolding using linear (or
polynomial) regression, but it is necessary to assure that the regression
function is essentially monotone over the region containing the data values.
While KYST cannot manage quite this, it can approximate it. The trick is to
feed in the selfsame data twice, that is, use duplicate copies of the data
deck. In front of one copy the phrase REGRESSION=POLYNOMIAL=integer is used.
In front of the other copy the phrase REGRESSION=ASCENDING or
REGRESSION=DESCENDING is used. Thus the overall stress reflects both deviation
from the desired polynomial fit and deviation from the appropriate
monotonicity.
- 29 -
14. User Limitations and Computing Times
As presently written, KYST has just a few primary limitations on the size
of the problems it can handle. Up to 100 objects (that is, points) may be
used. Scaling can be done up to six dimensions (and down to one dimension).
Up to 1800 data values may be used. In this regard, it is not necessary to
count data values which are formally present, but discarded because they are
below the CUTOFF value. (However, note that each row is read in completely,
before the data values in it are discarded. Thus if the number of data values
including those below the cutoff exceeds 1800, it is possible though unlikely
that the available storage space will be overrun and trouble result. This can
happen only if the data values below CUTOFF are concentrated near the end of
the data deck, so that at some point the number of genuine data values in prior
rows, plus the total number of data values in the current row, exceed 1800.)
If the data are split into separate sublists, up to 100 sublists can be
accommodated. For metric scaling, the regression can be linear, or polynomial
of up to fourth degree.
Computer running time will of course vary widely with the data, the
options chosen, and the computer. However, we believe that the running time
associated with a single COMPUTE card should be roughly proportional to (number
of data values) times (total number of iterations, summed over all
dimensionalities)
For one reasonable choice of options, the constant of proportionality on
the Honeywell 6000 series is approximately three milliseconds. It must be
noted, however, that this is a very rough estimate, and includes overhead CPU
time which is not proportional to the length of the computation.
15. Some Precautions: Number of Dimensions, Convergence, and Local Minima
If there are not very many objects, it is impossible to recover many
dimensions (regardless of whether many dimensions are in fact operative). This
is because the number of parameters estimated needs to be considerably less
than the number of observations (if the observations are subject to statistical
fluctuation), a general principle of wide application. Blind use of the
program can easily extract more dimensions than could possibly be meaningful.
Valuable information on the accuracy of recovery may be found in the
papers by [Young 1970] and [Sherman 1972], pertaining to the most common type
of scaling, namely, nonmetric scaling based on a halfmatrix without diagonal
and only a single replication. While it is difficult to give hard and fast
rules, the papers above and practical experience suggest the bare minimum
number of objects for wise use under the circumstances quoted:
for a 1-dimensional solution, 5 objects;
for a 2-dimensional solution, 9 objects;
for a 3-dimensional solution, 13 objects.
Furthermore, if the number of objects is relatively small, each additional
object will increase the accuracy of reconstruction quite a bit.
KYST finds the configuration of minimum stress by using an iterative
procedure (of the "steepest
- 30 -
descent" type) which starts with an initial configuration and keeps modifying
it until, hopefully, it converges to the configuration having minimum stress.
Two questions should always be asked about the solution yielded by such a
procedure: (1) Is convergence complete? In other words, did the procedure
finish converging to a minimum, or did it stop halfway? If convergence is not
complete, you may not want to even look at the solution, but rather put it back
in for further iteration. (2) Was the right minimum reached? In other words,
did the procedure converge to a "local minimum" configuration, from which no
small modification can decrease the stress, but at which the stress is greater
than the true minimum value? If you fear the solution is only a local minimum,
you may want to perform further scalings using other starting configurations to
check this possibility.
Strictly speaking, convergence to an exact minimum is only complete if the
partial derivatives of the stress are all exactly equal to 0. Because a
computer does not represent numbers with perfect precision, this does not
happen very often. However, in most practical situations it is possible to
come so very close to the minimum towards which the procedure is converging
that the difference can have no practical importance. In the vast majority of
cases, moreover, other considerations (such as the statistical fluctuations in
the data, or the fact that the model is not perfectly appropriate) mean that
the achievable precision is far greater than necessary.
Thus the practical question is whether convergence has proceeded far
enough to be practically complete. The program uses several tests to determine
this (such as how fast the stress has been decreasing from iteration to
iteration, and the size of the gradient), and declares that a minimum has been
reached if it is satisfied. Since rather severe criteria are used, this remark
is seldom in error. However, the procedure can terminate itself for other
reasons, and it is these that must be remembered. If the maximum number of
iterations is reached (normally 50, but easily altered by the user) or if the
stress becomes satisfactorily small (normally 0.01, but easily altered by the
user), or for several other reasons, the procedure will terminate with a
suitable remark.
The reason for termination is always printed. You should always look at
this comment. Even where the procedure does not state that a local minimum has
been reached, it may often happen that the solution reached is practically
indistinguishable from the local minimum that would be reached after a few more
iterations. In particular, where the stress is very small, this is generally
the case. The "history of the calculation" which may be printed out will often
be helpful if convergence is in doubt.
With regard to local minima, there is no feasible way to be mathematically
certain that you have found that local minimum which is in fact the true global
minimum. (While use of the TORSCA initial configuration reduces the chance of
reaching a local minimum in the sort of situations in which it has been tested,
KYST offers so many possible modes of use that no universal reassurances can be
given. In particular, unfolding is a relatively unexplored area.) However, by
using a variety of different random starting configurations to repeatedly scale
the same data, it is often possible to obtain practical certainty. For if a
variety of different starting configurations yield essentially the same local
minimum, with perhaps an occasional exception which yields a substantially
worse local minimum, then there is little to worry about. One easy way to
provide a variety of different starting configurations is to scale the same
data repeatedly, using SAVE=DATA and RANDOM=integer, which will provide a new
starting configuration on successive uses on the same computer run .
In this connection, it is well to remember that rotation and reflection
leave a configuration essentially unchanged. Thus to decide whether the
solutions obtained as suggested above are essentially
- 31 -
the same, you cannot merely compare coordinates. One method you can use is to
generate the matrix of distances, since these are unaffected by rotation and
reflection. However, a still simpler method which is often adequate is to
compare the stress values. If these values agree to three significant figures,
the local minima having them are unlikely to be different. In the case of
two-dimensional solutions, you can plot the solutions using the control phase
PLOT= CONFIGURATION=ALL, trace them onto tracing paper, rotate or reflect the
solutions by physical motions of the paper, and compare solutions by laying one
sheet on top of another.
Note that the likelihood of reaching a local minimum is known by
experience to be particularly severe for one-dimensional solutions, or if the
Minkowski r-metric is used with r near 1 or infinity, though for ordinary
scaling the TORSCA initial configuration greatly alleviates the situation. The
one-dimensional case is bad because points cannot easily move past each other
during the iterative computation if they lie on a line. The extreme values of
r cause trouble because the gradient is a discontinuous function of the
configuration if r = 1 or if r = infinity.
16. The Routines Which Make Up KYST
KYST is made up of a main program and 28 subroutines, all written in
FORTRAN IV. The routines contain many comment cards which should prove helpful
to the user who wishes to modify the program.
These are the routines:
KYST (Main routine) Multidimensional scaling.
BLOCK DATA Contains tables of control.
CCACT Control cards action. Reads and interprets control cards.
DATAPR Data print. Prints data and/or distances.
DTRAN Data transformation. Used with DFUNCTION option.
WTRAN Weights transformation. Used with WFUNCTION option.
FITM Fit monotone. Performs least- squares monotone regression.
FITP Fit polynomial. Performs least-squares polynomial regression.
REGR Regression functions. Provides values of monomials (functions
regressed over).
EQSOLV Equation solver. Solves up to five simultaneous linear equations
by simple equations.
INICON Computes initial configuration using Young-Torgerson
CONFIG Obtains a metric multidimensional scaling solution.
FITMS Fit monotone. Performs least-squares monotone regression for
initial configuration.
NEWSTP New step. Calculates size of step to be taken along gradient.
PLOT Generates one-page printer plot of a matrix Y versus a vector X.
SERCH Searches for next item to be processed
XITEM Put X vector elements into the current print line.
RM1POW Calculates the (R-1)th power.
RPOWER Calculates the Rth power.
RROOT Calculates the Rth root.
*We thank Dr. Peter Businger for providing this modern, efficient program which
uses the deflation technique.
- 32 -
RUNIFV Returns random uniform variable between 0 and 1. Inefficient
but portable.
SGEV | Eigenvector package. Returns
BSEC1 | the k algebraically largest
DFLT | eigenvalues and eigenvectors
NRMLZ | of an NxN real symmetric
NVIT1 | matrix given its upper trian-
SBK | gular part.
SORT Sorting routine. Simple but efficient in-core sorting routine,
using Shellsort technique.
17. Useful Information for Getting KYST Running at a Computer Center
All routines for KYST are written in FORTRAN IV. While considerable effort
has been made to improve the portability of KYST, and while its predecessor
routines M-D-SCAL and TORSCA are widely used, it is not possible to avoid all
difficulties. We describe here the limitations on portability and the
preparation which is needed to install KYST, to the best of our knowledge.
Some computer systems assign different amounts of space to real variables
and integer variables. As KYST equivalencies certain real and integer arrays
to each other, it will not work on such computer systems without appropriate
action. Since KYST uses quite elaborate equivalencing of arrays and labelled
common as shown in Fig. 3 at the end of this section , it is virtually a
necessity to handle this .problem by some systematic method which maintains all
the relative placements of arrays as shown in the figure. (These complex
arrangements achieve substantial economy of space and efficient transmission of
information among subroutines.)
It was not possible to avoid machine- and system-dependent constants in
KYST. However, they are all grouped together in a data statement in the BLOCK
DATA subroutine, and described here. It is vital that you set these seven
values appropriately for your computing system! There are three machine-
dependent constants (contained in a common block called ACCUR):
PRECSN = relative precision of floating-point numbers
(present value is 1.5*10E-8);
XMAG = least possible strictly-positive number
(present value is 1.0*10E-38);
XMAXN = greatest possible floating-point number, or the greatest
possible absolute value of a negative floating-point number,
whichever is smaller (present value is 1.0*10E38).
PRECSN should and XMAG may exceed the true value slightly, while XMAXN may
be slightly less. The present values reflect a 36-bit machine word in which
floating-point numbers use eight bits for exponent and 28 bits for mantissa and
sign. There are four system-dependent constants (contained in a common block
called MDSCL1): LREAD = input file number (present value is five) LPRINT =
printer or output file number (present value is six); LPUNCH = card punch
output file number (present value is seven); LSCRAT = file number for small
amount of scratch space, used only by CCACT (present value is eight).
The word size requirements for KYST are not fully explored. We believe
that positive integers need not exceed 216 - 1, and that negative integers are
far more limited. Integers receive complex handling in three circumstances
(where they serve as code numbers in CCACT, where three items per word
- 33 -
are manipulated in the IJ array, and in the random number generator): in the
first two circumstances integers do not exceed 215 - 1, in the last 215 - 1.
As for the floating-point numbers, we believe that the accuracy required by
KYST is quite low, but we have no concrete evidence.
The approximate storage space required on two different computer systems
is shown below, where the Honeywell 6000 system is the one in use at the Murray
Hill Computer Center of Bell Laboratories, and the IBM 370/165 at the Computer
Center of the University of North Carolina.
Number of machine-words (Number of bytes) required for Honeywell 6000,
IBM 370/165
All real and integer arrays 14,003
Program of KYST itself, and non-array data 11,689
Subtotal (Storage needed for the program) 25,692
Library subroutines, buffers, etc., loaded
with KYST package by the monitor system 7,300
TOTAL 32,992
- 34 -
Common Storage Areas
All the labelled common blocks are listed below as they appear in each
routine which uses them. Integer variables are marked with *, floating point
variables with ., and equivalenced variables with =.
The lengths of all arrays are indicated.
1. ACCUR: PRECSN., XMAG., XMAXN.
Used in: MAIN, BLOCK DATA, INICON, PLOT, SERCH, SGEV.
2. MDSCL1: LREAD*, LPRINT*, LPUNCH*, LSCRAT*.
Used in: MAIN, BLOCK DATA, CCACT, DATAPR, FITM, FITMS, INICON, PLOT, SERCH.
3. METRIC: RTYPE*, RMl., RECR., RR.
Used in: MAIN, RPOWER, RROOT, RM1POW.
4. PLTCHR: PTID 100., ITEM 101.
Used in: MAIN, BLOCK DATA, PLOT, XITEM.
5. KYST1: DATA 1800.,WW 1800.,IJ 1800*,X 600.
Used in: MAIN, CONFIG, DATAPR, FITM, FITP, INICON.
6. MDSCL3:
As used in BLOCK DATA: MTAB l3*, TAB1 64*, TAB2 68*, TAB3 64*, TAB4 76*,
TAB5 72*.
As used in CCACT: MTAB 13*, TAB 344* = ATAB 344.
7. MDSCL2:
As used in MAIN: LDIMX*, LDIMN*, LDIMD*, CUTOFF., STRMIN., SFGRMN., COSAVW.,
ACSAVW., IFIRST*, MATSWP*, SRATST., LSCH*, LPUNSW*, LSPL*, LRANDC*, LDAPRT*,
LDIPRT*, LREG*, LHIPRT*, NUDASW*, LPOLSP*, LCONSW*, LNFIXZ*, DCON1., DCON2.,
DCON3., DCON4., DCON5., WCON1., WCON2., WCON3., WCON4., WCON5., NOITIN*,
LPLCON*, LPLSCT*, LCOORS*.
As used in BLOCK DATA: LPAR 42* = PAR 42.
As used in DTRAN: DUMMY 28., A., B., P., S., T., WCON1., WCON2., WCON3.,
WCON4., WCON5., DUM 4.
As used in WTRAN: DUMMY 28., DCON1., DCON2., DCON3., DCON4., DCON5., A., B.,
P., S.,T., DUM 4.
8. KYST2: diagram on next page
Figure 3
- 35 -
REFERENCES
Coombs, C. (1964) A Theory of Data. New York: John Wiley & Sons, 1964.
Chapters 5, 6, and 7, pages 80-180.
Klahr, D. (1969) A Monte Carlo investigation of the statistical significance
of Kruskal's nonmetric scaling procedure. Psychometrika, 1969, 34, 319-330.
Kruskal, J. B. (1964a) Multidimensional scaling by optimizing goodness of fit
to a nonmetric hypothesis. Psychometrika, 1964, 29, 1-27.
Kruskal, J. B. (1964b) Nonmetric multidimensional scaling: A numerical method.
Psychometrika, 1964, 29, 115-129.
McGee, V. E. (1968) Multidimensional scaling of N-sets of similarity measures.
A nonmetric individual differences approach. Multivariate Behavioral Research,
1968, 3, 233-248.
Shepard, R. N. (1962) The analysis of proximities: Multidimensional scaling
with an unknown distance function. Psychometrika, 1962, 21, 125-139, 219-246.
Sherman, C. R. (1972) Nonmetric multidimensional scaling: A Monte Carlo study
of the basic parameters. Psychometrika, 1972, 51, in press.
Spence, I. A., & Ogilvie, J. C. A table of expected stress values for random
rankings in nonmetric multi-dimensional scaling. Multivariate Behavioral
Research, in press.
Spence, I. A. (1972) A Monte Cario evaluation of three nonmetric
multidimensional scaling algorithms. Psychometrika, 1972, 31, in press.
Stenson, H. H., & Knoll, R. L. (1969) Goodness of fit for random rankings in
Kruskal's nonmetric scaling procedure. Psychological Bulletin, 1969, 11,
122-126.
Wagenaar, W. A., & Padmos, P. (1971) Quantitative interpretation of stress in
Kruskal's multidimensional scaling technique, British Journal of Mathematical
and Statistical Psychology, 1971, 24, 10l-110.
Young, F. W. (1970) Nonmetric multidimensional scaling: recovery of metric
information. Psychometrika, 1970, 35, 455- 473.
Young, F. W. (1972) A model for polynomial conjoint analysis algorithms. In R.
N. Shepard, A. K. Romney, & S. B. Nerlove (Eds.), Multidimensional scaling:
Theory and applications in the behavioral sciences. New York Seminar Press,
1972 Vol 1, pages 9-102
- 36 -
INDEX OF CONTROL PHRASES
Phrases preceded by "*" are default options.
Numbers in parentheses are default values.
Number(s) at end of line are page number(s).
Word or Phrase & Page Number(s)
ABSENT 12
* ACSAVW=decimal number (.66) 25
ARBITRARY 19,20
ASCENDING 21
AS-IS 26
* BLOCKDIAGONAL=NO 12,13
=YES 12,13
BYDECKS 22
BYGROUPS 22
BYROWS 22
CARDS 25
COMPUTE 6,11
CONFIGURATION 19,20,26
CONSTANT 22
* COORDINATES=AS-IS 26
=STANDARDIZE 26
* =ROTATE 26
* COSAVW=decimal number (.66) 24
* CUTOFF=decimal number -1.23*10E20 13
* DCONSTANTS=five decimal numbers (0.0, 1.0, 1.0, 0.0, 1.0) 14
DESCENDING 22
DFUNCTION 14
* DIAGONAL=ABSENT 12
* =PRESENT l2
* DIMMAX=integer (2) 21
* DIMDIF=integer (1) 21
* DIMMIN=integer (2) 21
DISTANCES 26
* FIX=integer (0) 23
HISTORY 25
* ITERATIONS=integer (50) 23,24
LOWERCORNERMATRIX 12,13
LOWERHALFMATRIX 12,13
* MATRIX 12
MULTIVARIATE 22
NO 12,13
* NOCARDS 25
NOCONSTANT 22
NODATA 25
NODISTANCES 26
NOHISTORY 25
NOMORE 22,23
NONE 30
* PLOT=CONFIGURATION
* =ALL 26
=SOME 26
=NONE 26
* PLOT=SCATTER
=ALL 26
* =SOME 26
=NONE 26
POLYNOMIAL 22
PRESENT 12
- 37 -
* PRE-ITERATIONS=integer (1) 19
* PRIMARY 21
* PRINT=DATA 25
* =NODATA 25
=DISTANCES 26
* =NODISTANCES 25
* =HISTORY 25
=NOHISTORY 25
* R=decimal number (2.0) 21
* RANDOM=integer (0) 19,20
* REGRESSION
* =ASCENDING 22
=DESCENDING 22
=POLYNOMIAL=integer 22
* =CONSTANT 22
=NOCONSTANT 22
=MULTIVARIATE=integer 22
ROTATE 26
SAVE=CONFIGURATION 19,20
=DATA 13
SCATTER 26
SECONDARY 21
* SFGRMN=decimal number (0.0) 24
* SFORM1 22
SFORM2 22
SOME 26
* SPLIT
* =BYDECKS 22,23
=BYGROUPS 22,23
=BYROWS 22,23
=NOMORE 22,23
* SRATST=decimal number (.999) 24
STANDARDIZE 26
STOP 11
* STRMIN=decimal number (.01) 24,25
* TORSCA 8,19
UPPERHALFMATRIX 12,13
UPPERCORNERMATRIX l2,13
* WCONSTANTS=five decimal numbers (0.0, 1.0, 1.0, 0.0, 1.0) 15
WFUNCTION 15
YES 12,13