-*-Text-*-

	       Installation Notes for Liar version 4.9


Liar, the CScheme compiler, is available for the following computers:

	Sun 3
	HP 9000 series 300 (except model 310)

These are 68020 based machines.  Ports for 68000/68010 machines and
the Vax will be available in the future.

For bug reports send computer mail to

    BUG-LIAR@ZURICH.AI.MIT.EDU (on the Arpanet/Internet)

or US Snail to

    Scheme Team
    c/o Prof. Hal Abelson
    545 Technology Sq. rm 410
    Cambridge MA 02139

* The compiler is distributed as four compressed tar files, as
follows:

** "dist6.2.1-tar.Z" is release 6.2.1 of CScheme.  This is required
for using the compiler.  It is installed in the usual way except for
one small change to the microcode needed to support compiled code.
This tar file contains about 5.1 Mbyte of data when unloaded.

** "liar4.9b-tar.Z" contains the binary files for the compiler.  This
includes a ".bin" file (SCode binary, for the interpreter) and a
".com" file (native code compiler output) for each source file in the
compiler.  It also contains a few other files used to construct the
compiler from the binary files.  This tar file contains about 3 Mbyte
of data when unloaded.

** "liar4.9s-tar.Z" contains the source files for the compiler.  It
also includes a TAGS table.  This tar file contains about 1.2 Mbyte of
data when unloaded.

** "liar4.9d-tar.Z" contains some debugging files.  There is one
".binf" file corresponding to each ".com" file in the compiler.  Given
both of these files, the compiler can generate a symbolic assembly
language listing of the compiled code.  In future releases, these
debugging files will also support debugging tools for parsing the
stack and examining compiled code environment structures.  This tar
file contains about 4.5 Mbyte of data when unloaded.

* Installation of the compiler.  Installation requires about 17-20
Mbyte of disk space.  This is conservative and could be reduced with
some knowledge of what is needed and what is not.

** The first step in installation is building CScheme.  Follow the
instructions included in the release, except that the file
"makefiles/sun" or "makefiles/hp200" (as appropriate) must be edited
as follows.  Look for the following lines in that file:

    # Compiled code interface files.
    # These defaults are just stubs.

    CSRC  = compiler.c
    CFILE = compiler.oo
    D_CFILE = compiler.do
    F_CFILE = compiler.fo
    CFLAG =
    GC_HEAD_FILES= gccode.h

edit these lines to read as follows:

    # Compiled code interface files.

    CSRC  = cmp68020.s
    CFILE = cmp68020.o
    D_CFILE = cmp68020.o
    F_CFILE = cmp68020.o
    CFLAG = -DCMPGCFILE=\"cmp68kgc.h\"
    GC_HEAD_FILES= gccode.h cmp68kgc.h

    .s.o: ; as -o $*.o $*.s

After this is done, connect to the microcode subdirectory and execute
the following

    cp cmp68020.s-<sys> cmp68020.s

where <sys> is "sun" if you are running on a Sun 3, or "hp" if you are
running on an HP 9000 series 300.  NOTE: the file "cmp68020.s-src" is
the source file from which the other two were built.  It was processed
by m4 on an HP machine to create "cmp68020.s-hp", then that file was
processed by a custom conversion program (courtesy of the
butterfly-lisp hackers at BBN) to produce "cmp68020.s-sun".

Once these changes have been made, finish the installation process in
the normal way.

**** Note that on Sun workstations, assembling "cmp68020.s" will
produce the following harmless warning messages:

as: error (cmp68020.s:1432): Unqualified forward reference
as: error (cmp68020.s:1435): Unqualified forward reference
as: error (cmp68020.s:1444): Unqualified forward reference

Also, on older versions of Sun software (before release 3.4) you may
not be able to assemble this file at all.  For that case, we have
included the file "cmp68020.o-sun" which is the output of the
assembler on a 3.4 system.  Copy that file to "cmp68020.o" and touch
it to make sure it is newer than the source file.

** The next step in installation is unloading the Liar tar files.  The
tar files may be unloaded wherever you like.  When unloaded, they will
create a directory "liar4.9" under the directory to which you are
connected.

Note that only "liar4.9b-tar.Z" need be unloaded in order to perform
the rest of the installation.

In what follows, let $LIAR stand for the name of the directory in
which the compiler is loaded, and let $SCHEME stand for the name of
the directory in which the interpreter is loaded.

** After having unloaded the files, and after CScheme has been built
and installed, do the following:

    cd $SCHEME
    mv $LIAR/runtime/* runtime
    mv $LIAR/sf/* sf
    cd runtime
    scheme -fasl cmp-runmd.bin < $LIAR/etc/mkrun.scm

This transfers a number of compiled files to the Scheme runtime system
directory, and constructs a new version of the runtime system, named
"scheme.com", which is partially compiled.  After this has been done,
you may discard all of the ".com" files in the runtime system
directory.  If you want the new runtime system to be the default,
rename it to "scheme.bin".

**** Note: because this is a beta release, the compiled runtime system
"scheme.com" is likely to have bugs.  If you intend to use it by
default, we suggest you retain the original (interpreted) runtime
system "scheme.bin" by renaming it to something else.

** Next, do the following:

    scheme -constant 510 -heap 500 -band $SCHEME/runtime/scheme.com

This starts up the scheme interpreter with a large constant space and
heap, using the partially compiled runtime system.  After the
interpreter has started, type the following expression at it:

    (begin (%cd "$LIAR")
	   (load "machines/bobcat/make" system-global-environment)
	   (disk-save "$SCHEME/runtime/compiler.com"))

it will load two files, then ask the question "Load compiled?".  Type
Y, which means to build the compiler using compiled code.  If you type
N, the compiler will be run interpretively, which is about a factor of
10 slower than the compiled version.

After you answer the question, it will load and evaluate approximately
100 files.  This will take several minutes.  When it is done, you are
returned to the interpreter.  At this point, a new band will have been
created, called "$SCHEME/runtime/compiler.com", which contains the
compiler.  All the other files in the $LIAR directory may be
discarded, if you wish, since only "compiler.com" is needed to run the
compiler.

* Using the compiler.

** Loading.  The compiler band, "compiler.com", is used by starting
Scheme and specifying that file using the "-band" option.  You must
also use the "-constant" option to specify that the constant space is
at least 510, and it is recommended that the "-heap" be specified at
least 500.  For medium to large compilations, a heap size of 700 or
more may be needed; at MIT we typically use 1000 to be safe.

Alternatively, the switch "-compiler" specifies constant 510, heap
500, and the compiler band.

** Memory usage.  Note that the total memory used by Scheme in this
configuration is substantial!  With a heap of 1000 and a constant
space of 510, the memory used is (* 4 (+ 510 (* 2 1000))), or about 10
Mbyte.  For many computers this is a ridiculous figure, and Scheme
will die a slow death due to paging.  Using a heap of 500 reduces this
to about 6 Mbytes, but that is still quite alot.

For machines with small memories, using the `bchscheme' version of the
microcode will be helpful.  This program, which is made by connecting
to "$SCHEME/microcode" and typing "make bchscheme", does its garbage
collection to a disk file, thus requiring only one heap in the virtual
address space.  This reduces the overall memory requirements for the
above examples to 6 Mbyte and 4 Mbyte, respectively.  The savings of 4
and 2 Mbytes (respective) will be allocated in the file system rather
than in virtual memory.

This may seem like a complicated way of doing virtual memory
management, but in fact it performs significantly better than paging
on machines with small amounts of RAM.  This is because the GC
algorithm uses the disk much more efficiently than the paging system
will be able to.

** Compilation.  The following global definitions are available for
calling the compiler:


(COMPILE-BIN-FILE FILENAME #!OPTIONAL OUTPUT-FILENAME)

Compiles a binary SCode file, producing a native code file.  FILENAME
should refer to a file which is the output of the SF program (see
"$SCHEME/documentation/user.txt" for a description of SF).  The type
of the input file defaults to ".bin".

OUTPUT-FILENAME, if given, is where to put the output file.  If no
output filename is given, the output filename defaults to the input
filename, except with type ".com".  If it is a directory specification
(on unix, this means if it has a trailing "/"), then the output
filename defaults as usual, except that it goes in that directory.

This is similar to the operation of SF.  Also, like SF, the input
filename may be a list of filenames, in which case they are all
compiled in order.


(COMPILE-PROCEDURE PROCEDURE)

Compiles a compound procedure, given as its argument, and returns a
new procedure which is the compiled form.  This does not perform side
effects on the environment, so if one wished to compile MAP, for
example, and install the compiled form, it would be necessary to say

    (set! map (compile-procedure map))


(COMPILER:WRITE-LAP-FILE FILENAME)

This procedure generates a "LAP" disassembly file (LAP stands for Lisp
Assembly Program, a traditional name for assembly language written in
a list notation) from the output of COMPILE-BIN-FILE.  If filename is
"foo", then it looks for "foo.com" and disassembles that, producing a
file "foo.lap".  If, in addition, the file "foo.binf" exists, it will
use that information to produce a disassembly which contains symbolic
names for all of the labels.  This second form is extremely useful for
debugging.


(COMPILE-DIRECTORY DIRECTORY #!OPTIONAL OUTPUT-DIRECTORY FORCE?)

Finds all of the ".bin" files in DIRECTORY whose corresponding ".com"
files either do not exist or are older, and recompiles them.
OUTPUT-DIRECTORY, if given, specifies a different directory to look in
for the ".com" files.  FORCE?, if given and not #F, means recompile
even if the output files appear up to date.

* Debugging compiled code.  At present the debugging tools are
practically nonexistent.  What follows is a description of the lowest
level support, which is clumsy to use but which is adequate if you
have a moderate understanding of the compiled code.  This is one of
the prices of beta test!  Before release we will have user-level
debugging tools.

There are two basic kinds of errors: fatal and non-fatal.  Fatal
errors are things like segmentation violations and bus errors, and
when these occur the only method of debugging is to use an assembly
language debugger such as `adb' or `gdb'.  Debugging these errors is
complicated and will not be described here.

** Non-fatal errors can be debugged from Scheme.  Here is the method:
the file "$LIAR/etc/stackp.bin" contains a simple stack parser that
will allow you to display the Scheme stack, and refer to any of the
items in the stack by offset number.  Loading this file (into the
global environment, for example), defines two useful procedures:

(RCD FILENAME) writes a file containing a description of the current
stack.  When an error has occurred, the current stack contains the
continuation of the error, which is the information you want to see.
Each line of the file contains an offset number and the printed
representation of an object (the latter is truncated to fit on one
line).

(RCR OFFSET) returns the object corresponding to OFFSET from the
current stack.  Thus, after using RCD to see the stack, RCR will get
you pointers to any of the objects.

Given these procedures, you can look at the compiled code stack
frames, and possibly (with some skill) figure out what is happening.

** Compiled code objects manipulators.  Another set of useful
procedures, built into the runtime system and defined in the file
"$SCHEME/runtime/ustruc.scm", will allow you to manipulate various
compiled code objects:

(COMPILED-PROCEDURE-ENTRY PROCEDURE) returns the entry point of the
compiled procedure PROCEDURE.  This entry point is an object whose
type is COMPILED-EXPRESSION.

(COMPILED-CODE-ADDRESS? OBJECT) is true of both COMPILED-EXPRESSION
objects as well as COMPILER-RETURN-ADDRESS objects.

(COMPILED-CODE-ADDRESS->BLOCK COMPILED-CODE-ADDRESS) returns the
compiled code block to which that address refers.  The procedure
COMPILED-CODE-BLOCK/DEBUGGING-INFO will tell you the name of the
".binf" file corresponding to that compiled code block, if the
compiled code was generated by COMPILE-BIN-FILE.

(COMPILED-CODE-ADDRESS->OFFSET COMPILED-CODE-ADDRESS) returns the
offset, in bytes, of that address from the beginning of the compiled
code block.  NOTE: this offset is the SAME offset as that shown in the
disassembly listing!  Thus, given any compiled code address, you can
figure out both what file it corresponds to, plus what label in the
disassembly file it points at.  This is the basic information you need
to understand the stack.

There are several other procedures defined for manipulating these
objects -- see the source code for details.  What follows is a brief
description of the object formats to aid debugging.

** Compiled Code Blocks.  Compiled code blocks are "partially marked"
vectors.  The first part of a compiled code block is "non-marked",
which means that the GC copies it but does not look through it for
pointers.  This part is used to hold the compiled code.  The second
part is "marked", and contains constants that are referred to by the
compiled code.  These constants are ordinary Scheme objects and must
be traced by the GC in the usual way.

The disassembly listing shows the compiled code block in the same
format that it is laid out in memory, with offsets in bytes from the
beginning of the block.  The header of the block is 8 bytes, so the
disassembly listing starts at offset 8.  The code and constants
sections are displayed separately, in slightly different formats.

** Procedure Entry Points.  The entry point of a procedure can be
found in the LAP file by looking for a label with the same name as the
procedure, concatenated with some positive integer.  Unnamed lambda
expressions will be lambda-<n> for some <n>.  Closed procedures (i.e.
those procedures which have an external representation) have two entry
points, whose labels differ only in the concatenated integer.  The
first entry point is responsible for checking the number of arguments,
and transfers control to the second entry point if that is correct.

** Stack Frames.  The normal stack frame for a closed procedure is
constructed by pushing the return address, then all the arguments
right to left, then the procedure.  If the procedure has internal
definitions, then these are pushed on the stack on top of that in some
unspecified order.  Internal procedures, when invoked, may either
extend the closure's frame or create new frames.  The rules for this
are complicated and far beyond the scope of this document.  However,
two special types of stack pointers may be used when the closure's
frame is extended.

The first of these is a "static link".  This is a pointer into the
stack which is used when a sub-frame needs to refer to bindings in
some parent frame, but the compiler was unable to determine where that
parent frame was at compile time.  The other type is a "dynamic link",
which points to where the return address for the current procedure is
located in the stack.  Because of tail recursion considerations, the
compiler cannot always determine this at compile time, and in those
cases dynamic links are used.  The dynamic link is normally kept in
register A4, and pushed and popped off the stack at appropriate times.

Note that internal procedures evaluate and push their arguments in a
completely unspecified order.  Thus if your program depends on the
fact that the interpreter evaluates arguments from right to left, you
might be screwed, since the compiler chooses whatever order seems most
efficient or convenient.
