How to Use the ADDTREE/P Program for Fitting Additive Trees
copyright 19811995 by James E. Corter
All rights reserved.
ADDTREE/P is a program in the PASCAL language for fitting additive
trees to proximity data, using a variant (Corter, 1982) of the
algorithm described by Sattath and Tversky (1977). Differences
between the original Sattath & Tversky algorithm and the variant used
here are described more fully in Corter (1992).
The input data to ADDTREE/P may be either similarity or distance data.
A listing of parameters and their estimates, drawing of the tree graph,
and summary statistics are reported. Various other output options are
available. A help file accompanies this source file. See it for more
information on using the program.
ADDTREE/P was written by James E. Corter. Questions or comments
regarding the program may be addressed to him at:
INTERNET: jec34@columbia.edu
Box 41, Teachers College, Columbia University, New York, NY 10027
NOTICE REGARDING RETRIEVAL AND USE OF ADDTREE/P AND ALL ASSOCIATED
MATERIAL:
By retrieving or using any of the files in this directory
pertaining to ADDTREE/P, you are agreeing to the following
terms and conditions:
1. All the material you retrieved is and will remain the property
of the author, James E. Corter. It is provided only for notfor
profit research and teaching purposes. The material to be supplied
is not to be published or redistributed, with or without modifi
cation, or resold, or used for any business or commercial purpose
without the express written permission of the author.
2. THIS MATERIAL IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR
IMPLIED WARRANTY. IN PARTICULAR, BUT WITHOUT LIMITATION, NEITHER THE
AUTHOR NOR AT&T MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND
CONCERNING THE MERCHANTABILITY OF THIS MATERIAL, ITS FITNESS FOR ANY
PARTICULAR PURPOSE, ITS FREEDOM FROM INFRINGEMENT OF ANY PATENT OR
COPYRIGHT, ITS ACCURACY, OR PROVISION OF TECHNICAL SUPPORT.
(revised 4/95)
How to Use the ADDTREE/P Program for Fitting Additive Trees
James E. Corter
Teachers College, Columbia Uinversity
Box 41, Teachers College
Columbia University
New York, NY 10027
(INTERNET: jec34@columbia.edu)
ADDTREE/P copyright 19811995 by James E. Corter
All rights reserved.
==================================================================
CONTENTS:
I. Description of input file format
II. Example input and output files
III. Differences from the ADDTREE program
IV. Installing ADDTREE/P on your system
V. References
==================================================================
I. Description of Input File Format
Input files should contain the following lines:
*****************************************************************
(1) first line: a list of analysis option commands, zero or more
of the following keywords. Note that lowercase letters may be
used, and that all keywords may be abbreviated to 3 characters.
DATA Requests printing of the raw data matrix.
TRANSFORMED Requests printing of the data, after transformation
into distancelike numbers.
MODELDISTANCES Requests printing of the model (tree) distances.
RESIDUALS Requests printing of the residuals matrix.
HEIGHTS Requests printing of the distance of each node from the
root.
NOTREE Suppresses printing of the tree graph.
NONUMBER Suppresses numbering of objects in the tree graph.
SPECIFYORDER If this is included, line 1 of the input file must
be followed by a line containing a list of object numbers. The
leaves of the tree graph are then ordered to conform as closely as
possible to this list. (NOTE: Care must be taken not to omit or
duplicate numbers in this list. Also, note that not all orderings
of the leaves are possible, since the tree structure imposes
constraints on the possible orderings.)
NOSUBTRACTCONSTANT The default assumption is intervalscale data;
this implies complete freedom in choosing an additive constant.
Therefore the primary approach is to either add OR subtract an
additive constant to exactly satisfy the triangle inequality:
d(i,j) + d(i,k) <= d(j,k) for all i,j,k, AND d(i,j) + d(i,k) =
d(j,k) for some i,j,k. But if "nosubtractconstant" is specified,
strict inequality is allowed, i.e. if d(i,j) + d(i,k) > d(j,k)
holds for all i,j,k in the data, no constant is subtracted.
MINVARROOT The default rooting is to combine the last few
remaining clusters into the root node; if "minvarroot" is
specified, the program will search for the root that minimizes the
variance of the distances from the root to the leaves.
ROOTAT Specify where the root is to be placed. If this option is
requested, this keyword must be followed by two node numbers, e.g.
"rootat 12 13 ". Note that the two nodes specified must be
contiguous in the tree structure. A previous run will generally be
necessary to determine the correct node numbers.
LINESIZE Sets the length of lines in the output file. The current
default value is 80 characters (so that output may be conveniently
read on a video terminal). Example: "linesize 66 ".
Note that any of the keywords mentioned above may be abbreviated to
three (or more) characters; also, capital or lowercase letters are
ok. Keywords may be separated by either blanks or commas.
However, when a keyword above specifies that a parameter is to be
read in following the keyword (e.g. ("rootat 12 13 "), it is best
to follow the numerical parameter with a (" ") instead of
a (",") since some PASCALs will object to finding a comma
while attempting to read a number.
If the first line is blank, the standard analysis is done
and the default output is obtained (estimates, tree graph, and
statistics).
*****************************************************************
(1.5) (optional line: see SPECIFYORDER above)
*****************************************************************
(2) second line: a comment line for labeling of the output (up to
80 characters in length).
*****************************************************************
(3) third line: the following parameters (separated by commas or
blanks):
(integer)
SIMILARITIES or DISSIMILARITIES (specifies similarity or
distance (dissimilarities) data)
FULL or LOWERHALF (specifies shape of matrix)
(integer; width of OUTPUT data fields)
(integer; number of
digits after the decimal place in the OUTPUT matrices)
NOTE: the last two parameters here refer only to the OUTPUT. Also
note that you should try to leave at least two spaces between
numbers: e.g. if your data is numbers between 0 and 99, and you
want to see one digit after the decimal place, you should specify
6 and 1 for these parameters. This is to leave room for possible
minus signs and the decimal point itself.
*****************************************************************
(4) fourth, etc. lines: the data matrix itself
NOTE: some Pascals require real format, that is, the data must have
a decimal point, and digits both before and after (e.g. 45.0, not
45 or 45. or .45) NOTE: the data matrix can be in a variety of
formats, as long as the order of entries corresponds to the order
of entries in a (full or lowerhalf) matrix. So for example, the
input data file might contain only one number to a line, as long as
the order of the lines corresponds to the order of entries in a
lowerhalf (or full) matrix. However, one thing that must NOT occur
is trailing blanks on a line, since the program then expects
another number on that line.
*****************************************************************
(5) (optional) lines immediately following the data: a list of
object names, one to a line, maximum length 20 characters.
II. Example Input and Output Files
EXAMPLE INPUT FILE:
nonumber,residuals,minvarroot
numbers: abstract concept (Shepard, Kilpatric, & Cunningham; 1975)
10 dis low 7 1
421
584 284
709 346 354
684 646 059 413
804 588 671 429 409
788 758 421 300 388 396
909 630 796 592 742 400 417
821 791 367 804 246 671 350
400
850 625 808 263 683 592 296
459 392
zero
one
two
three
four
five
six
seven
eight
nine
EXAMPLE OUTPUT FILE:
addtree analysis (ADDTREE/P version 1.5):
numbers: abstract concept (Shepard, Kilpatric, & Cunningham; 1975)
( 0.0 needed for positivity of distances )
303.0 added to exactly satisfy triangle inequality
var. of dis. from root to leaves =61715.5041
node 11 score=43795.738
node 15 score=61831.331
node 16 score=116529.784
node 14 score=116529.784
changeroot( 17 => 11)
node 1 score=147886.744
node 2 score=147886.744
node 15 score=61831.331
node 18 score=61715.504
var. of dis. from root to leaves =43795.7375
node length children
1 454.6 zero
2 269.4 one
3 185.6 two
4 235.6 three
5 176.4 four
6 327.4 five
7 264.1 six
8 375.6 seven
9 325.3 eight
10 330.4 nine
11 127.5 1 2
12 103.2 3 5
13 53.9 4 10
14 81.8 6 8
15 97.5 12 9
16 37.9 13 7
17 53.1 15 18
18 77.4 16 14
19 0.0 11 17
 zero

  one

  two
 
   four
  
   eight
 
  three
 
   nine
  
  six

  five

 seven
stress formula 1 = 0.0926
stress formula 2 = 0.4958
r(monotonic) squared=0.7542
rsquared (p.v.a.f.)=0.6249
residual distances:
0.0
134.5 249.4
28.1 206.0 134.2
25.2 121.9 0.0 66.0
14.8 45.7 101.0 4.7 151.7
76.4 231.5 41.7 49.4 65.5 12.2
42.0 51.8 177.9 110.2 133.1 0.0 39.3
66.1 221.2 55.9 279.3 55.9 64.6 149.2 254.6
18.2 21.7 225.0 0.0 109.3 63.6 49.4 117.6 227.4
III. Differences from the ADDTREE Program
ADDTREE/P is based on the algorithm used by the ADDTREE program
written by S. Sattath, Hebrew University (Sattath & Tversky, 1977).
Differences exist in the method of resolving ties in the nearest
neighbor count matrix and in the choosing of a root for the tree.
The tiebreaking rule used in ADDTREE/P results in slightly better
solutions in a small percentage of cases. Whenever a tie occurs
in the choosing of pairs of objects to be joined (that is, if
objects y and u are both nearest neighbors of x), or when the
nearestneighbor relationship is not reciprocal (e.g. x is y's
nearest neighbor, but u is x's), the preferable grouping is chosen
by what amounts to a check of the additive inequality for the
quadruple x,y,u,V (where V refers to all the objects in the tree
not = x,y,u).
One earlier version of the current program attempted to find the
"deepest" rooting of the tree; in recent versions the root is
created out of last few nodes remaining from the clustering
process. This tends to produce balanced trees, requires no extra
processing time, and is a relatively nonmetric criterion. Also
available as options are (2) the root that minimizes the variance
of the distances from the root to the leaves of the tree, and (3)
any arbitrary userspecified root. Since no single criterion will
choose the most satisfactory rooting for every application, this
variety of options seems desirable.
For additional description of differences from the earlier
ADDTREE program, see Corter (1982).
IV. Installing ADDTREE/P on your System
The ADDTREE/P and EXTREE programs included have been
compiled and tested on a number of systems using various
PASCAL compilers. On IBM PC's, Turbo Pascal is recommended.
Previous versions of the programs have been successfully
compiled on UNIX systems and on IBM mainframes (using the IBM
interactive PASCAL compiler). Because the PASCAL language as
originally defined does not contain specifications for certain I/O
and file definition operations, statements pertaining to these
operations may vary from compiler to compiler.
V. REFERENCES
Corter, J.E. (1982). ADDTREE/P: A PASCAL program for fitting
additive trees based on Sattath & Tversky's ADDTREE program.
Behavior Research Methods and Instrumentation, 14, 353354.
Corter, J.E. (1992). An orderN**3 combinatoric algorithm for
fitting additive trees. Paper presented at annual meeting of the
Classification Society of North America, East Lansing MI.
Sattath, S., & Tversky, A. (1977). Additive similarity trees.
Psychometrika, 42, 319345.