Machine Learning: The Othello Project
A Streamlined GP Approach
by Arie Addi, arie@mtispan.att.com
April 4, 2000

Introduction

For this project, we were asked to examine genetic programming approaches to playing the game Othello. Considering this is an ongoing project, I will avoid repeating the general concepts of the project which should be found at http://www.cs.columbia.edu/~evs/ml/othello.html. I have chosen to modify and improve training performance of the original GP implementation to Othello by changing primitives and performing variations on the learning method.
 

Approach

After spending considerable time reading past papers on this project I found Juno Suk's project to be the most comprehensive and engaging. When attempting to make similar modifications though, I found that there was an enormous amount of time needed for training.  I therefore looked for ways to improve training time and performance. I like to consider this an extension to the Ockham's razor principal, where if I simpler answer exists and is effective, why not stick with it. So the question I wanted to answer was: How simple can I make the hypothesis representation before performance begins to degrade?

Analysis and Observation of the game Othello
After playing several games with EDGAR, the AI othello player, I came to several conclusions.

Playing Observations

1) The numbers of opponents pieces flipped at each move did not necessarily improve my final score. On several occasions I chose a move which would flip the least amount of pieces. This actually allowed for less 'good' moves the opponent had available. The intention here was toward the end of the game I would have more valuable positions available to me, namely the edges and corner locations. This observation was reinforced by reviewing resulting functions produced by other GP approaches. The more successful ones did not contain white or black primitives.

2) Owning the edges and corner positions of the board made the difference. This is probably the most obvious observation players come up with after a few games. This in turn made the pieces nearest to the corners and edges least valuable as the opponent can then reach the edge locations.

Training Observations
3) The thirteen original primitives were really dragging the training. After waiting a day and a half for results it seemed the hypothesis space was just too large.

4) An increase in population size had dramatic effects in the wait time. When I set a population size of 500, three days were not enough to get results. The analysis In Daby Sow's Paper demonstrates there were diminished returns for a population size greater than 100.

Strategy Used

Implementation Issues

I founded that depending on if the GPPlayer was playing against random or EDGAR the GPPlayer was a different color. I found this annoying so I updated all the java files to ensure that GPPlayer would always be white to avoid any confusion.

In addition I added another command line argument for OthelloCache  to specify if the GP function would play EDGAR or a random player.

Updating all the primitives were a painstaking process as they are repeated a number of times in 4 java files.

 
black_corners 2
black_near_corners 6
black_edges 30
8 and 3 (constants) 2
Table 1: Streamlined Primitives Used

Results

The mean for result(s) from each training session are shown. Click on the links to view the actual output files.
Note: All training was done against random players. the Base Line Testing and the  GP function is always the black player.

SET 1: Base Line Testing
In the first test set I took the original configuration and will use this for my base line testing. This includes
all thirteen primitives and a population size of 500.

GP trained against random (set 1)
-------------------------
Best GP function:                 black

Mean random player:      45 wins, 5 losses

One Run Against EDGAR(white)      EDGAR 48  GP 16



Set 2: Streamlined Primitives, Population 50
GP trained against random (set 2)
-------------------------
Best GP function:   ( - ( *   black_corners   black_edges )   black_near_corners )

Mean random player:      49 wins, 1 losses

One Run Against EDGAR(white)      EDGAR 55  GP 9


Set 3: Streamlined Primitives, Population 100

GP trained against random (set 3)
-------------------------
Best GP function:   ( -  black_corners  black_near_corners )

Mean random player:      44 wins, 6 losses

One Run Against EDGAR(white)      EDGAR 55  GP 9
 
 Set Name
 Nbr of
Primitives
Population Size
 RunTime
(secs)
 Secs Per
Generation
 % Won
Against Rndm
GP vs EDGAR
Set 1
13 500 11847 226  90  16 vs 48
Set 2
9 50 1648 31.6  98  9 vs 55
Set 3
9 100 3353  62  88  9 vs 55

Problems

The resulting GP functions returned after training appeared to always be in reverse. That is, it returned a  larger negative value for the better moves. I resolved this by feeding GPOthelloCache function flipped (quite appropriate for Othello). For ex. ( - 3 black_corners ) is flipped to ( - black_corners 3 ) . I suspect the problem is associated with the modification of GPOthello to have GPTester as the black player not as white.

Conclusions

This project proved to be interesting, engaging although frustrating due to the long down time of training.  I was able to demonstrate that an hypothesis can be represented simpler and still produce effective game playing. The run time was dramatically reduced yet still contain the essential primitives for hypothesis representation. Unfortunately there was no improvement to playing against EDGAR. What I was surprised about was that the GP rarely produced functions with all three othello specific primitives. I expected this because these were essentially the primitives I used when I played against EDGAR. I was surprised to see that increasing the population from 50 to 100 actually reduced its performance against random players. The results may have been tainted by the problem I mentioned above, where GP function actually was producing functions to produce the worst possible results for the player with the black position.

There was a large effort in updating the pirmitives due to the number of places the primitves were located. If were to continue this project I would first address this issue of reverse functions. I also would like to add another two primitives:
A position value 'close_to_edge' - this should discourage any playing along the set of squares before the edge. The other primitive would be move_nbr which informs the GP how far in the game the player is in. I would be interested to see how this primitve would effect the GP.

From my exposure to GP from class and this project I find that the promise of a generic learner is a distant dream. The hypothesis space needs to be reduced to be effective. In order to do this the problem must be analyzed to decide what would be effective primitives. This would make the global minimum easier for the GP to discover. Otherwise the training time would be enormous and the GP would return only local minimums.

Source Files