The following are caveats and technical items of addendum that I am
providing to supplement and help smooth over the use of the GP-Othello
assignment by other machine learning teachers.  -Eric

Note from one student:
    The training on CS machines in the department took a very long
time. We were usi ng "nohup" command to run the processes
overnight. The processes were getting killed before they were able to
generate any results. As we have figured out later, the process of
drawing the best individual could not have bee n handled together with
the "nohup" command. We were able to get the results only after we
turned off the drawing op tion in the "ini" file.
Turn off the number-of-good-runs and complexity penalty parameters in
the GP implementation.

The next few emails are important addendums towards running this
assignment smoothly:

Subject: GP Othello assignment


The TA from last semester and myself are making sure the Othello
and GP source are still available online.

Below is the email to get folks started F97 on the GP system.  It
gives a little assignment to play with the GP system -- YOU ARE EXEMPT
more than welcome to look at its details in order to learn about
the GP system.

Note that we hope to put add your Othello writeups to the Othello


ML Class,

As promised in class, here is your short assignment to familiarize yourself
with the Java GP implementation we will use for Project Othello.  It will
only take you a couple hours at the most.

We strongly urge you to do this before Tuesday, since it will also
introduce and further enforce genetic programming concepts for the midterm,
and will help you understand our presentation of Project Othello in class
on Tuesday.

0. Read about GA and GP in the text, and the in-class handout of 3 papers
(all stapled together as one).

1. Follow the directions /u/niform/eeskin/TA/dist/GP/README-1.txt (on cs
machines) to install and test the implementation.  On AcIS machines, look
in ~ee67/GP/

This will get you started on a system that tries to regress on x**4 + x**3
+ x**2 + x.  That is, it is evaluating the fitness of individual function
trees in the population by how well they predict the values of the above
equation for 20 values of x (i.e., 20 fitness/training cases).  The fitness
measure is not squared error, but, more simply, is sum of the absolute
errors over these 20 cases.  This is the "classic" GP symbolic regression

2. Add as a new primitive the constant "2", which can now appear as a
terminal in hypotheses (i.e., function trees).  Note, only the files in the
SymbReg directory need to be modified for this assignment.

3. Change the target function to 2*(x**2) + 4.

4. Recompile and run it.

5. Try (only) 4-6 runs with variations on at least two parameters, such as
the population size, frequency of crossover, number of fitness cases (which
cannot be changed in the .ini file), etc.  You may want to decrease the
GoodRuns parameter (number of runs before it stops) down to only 1 run.

6. Email the resulting function trees (text only) and fitnesses thereof to
email: evs at cs dot columbia dot edu.

7. Is anything wrong with this classic paradigm?  In particular, do you see
it testing for generalization performance?  What would be needed to make a
fair assessment of resulting hypotheses?  Do not try to implement this.

Have fun,

Subject: GP assignment ADDENDUM

S98 ML folks,

A few more items re othello project:

1. Have you received the photocopy handout with 3 GP articles?  I
think CVN kept it and does that stuff.  Else, let me know and
I will photo it and bring to CVN ASAP.

2. One student said:
    The training on CS machines in the department took a very long
time. We were usi ng "nohup" command to run the processes
overnight. The processes were getting killed before they were able to
generate any results. As we have figured out later, the process of
drawing the best individual could not have bee n handled together with
the "nohup" command. We were able to get the results only after we
turned off the drawing op tion in the "ini" file.
3. turn off number-of-good-runs so it only does one run
4. turn off complexity penalty, probably, unless you want to
experiment with it

5. Below is a message from last semester.


ML Folks,

Please read this entire message before starting your Othello work.

Here are two more hypothesis represenation alternatives to help inspire
ideas.  The second come from Chris (as well as a GA Othello paper we have).
If you have any additional ideas that you are not doing yourself,
please share them with everyone over email.

* more fundamental, simple primitives, e.g., x/y coordinates of the piece
just placed to get to this new board configuration -- could GP use this and
automatically build the concepts of "edge", "corner",
"one-away-from-corner", or other useful concepts we haven't thought of?
(Get rid of (some) other primitives for this experiment.)

* perform a shallow search, e.g., minimax, and apply the tree at the
end-points of this search.  That is, resulting player uses the
hypothesis/tree to evaluate endpoints of a shallow search to determine
which is the best current move.  This of course requires a fair amount of
programming, and results in an even lengthier fitness evaluation -- other
changes would be required to keep the fitness measure fast enough.


If you are on a cs account, you need to use it judiciously.  It is very
easy to clog up the systems with our long processes.  This always creates
instant negative vibes from all the cs users.  We are at risk as a class
because of computational hog-ness of this project.

Therefore, please limit yourself to one or two runs going at a time.  You
can write shell scripts to sequentially do an array of runs with certain
variations.  fyi, you can use nohup to start a process that keeps going
even if you log out.

It is better to not do runs on the department's "cluster" machines, such as
age, ground, shadow and a couple others.  Also, try to use machines that
have less going on, less people logged in.

It is good to use the machines in SRL.

There is a cluster of HPs that are fast, but I don't know if they have java
or java compilers or JIT.  They are called:

        donner blitzen cupid dancer prancer comet dasher vixen

Note that one or two of these are permanently down.  If you discover these
do/don't have java, please email everyone.

btw, (only) if you have time, it is better to do multiple runs (different
random seeds) of the same experiment, to examine average performance.

Finally, unless absolutely necessary, please send cs system
questions/issues to myself and Eleazar *before* sending email to crf, so we
can screen questions -- they are overloaded, and our place on cs machines
is touch and go...


Subject: JAVA GP

fyi, Here is a version of the README for the gp system -- the
one in the system directory may be more up to date...


In order to install the Genetic Programming package into you
directory, you must copy all of the files into a directory in
your account from the directory "/u/niform/eeskin/TA/dist/GP/"

You must also copy all of the sub directories to this directory so you
must use the "-R" flag to copy everything.

This will give you a copy of the genetic programming package.  Not
there are several sub directories within the package.  "gpjpp"
contains the source files.  "docs" contains the javadoc documenation
for the source files.  The other directories corespond to example

In order to compile the GP package, you must first include the
compiler in your path.  To do this add to your path on the cs cluster:


In order for java to find the relevent source files, add the following
paths to your CLASSPATH variable:


later you will also add:


Now that you have set these variables, it is time to compile.
Enter the directory "GP/gpjpp" and type "javac *.java"
This should take a while but you should recieve no errors.

Then enter the directory "GP/SymReg" and type "javac *.java"
Now the Symbolic Representation Package should be running.  In order
to execute it, you must type:

"java SybReg"

This should be all you need to get the GP implementation up and
running in your directory.

There are several files in the SymReg directory that are of interest.
The is the source file where the Sybolic Regression is
defined.  Note this is the ONLY source file that has to do with
Symbolic regression.  You will not need much more to use the GP with

There is also a file called SymReg.ini .  This file has a lot of
parameters for the GP.  A complete set of possible parameters are
contined the in documentation for the GPVariables class.  Note you can
tweak these parameters pretty easily to experiment with the GP.

There is also a file that is called that gives statistics
on a run of the program.  A file called SymReg.det can be reated if
you set the PrintDetails option to true in the SymReg.ini file.  This
will give you pleanty of details for each generation including best
and worst individual.

After a run is over, a gif file of the tree of the best individual is
created.  Note, to keep the GP from crashing you must set an
appropriate DISPLAY variable.

Note also that the documentation is in html.

email: evs at cs dot columbia dot edu