Ilia Malkovitch
and

Dmitriy Stolyarov

starring in

GP-Othello


 

Introduction

In running some preliminary experiments using the original GP Othello implementation we discovered that the genetic programming approach tends to overfit very much to the player against which it is trained. Thus training against Random player gives an individual that beats it, but generally fails miserably against Edgar. The reverse is also true - when trained against Edgar, it beats it consistently, but often loses to Random.

The logical conclusion, and this was the avenue that we chose to pursue in our experimentation, is to develop a method that would evolve more general players that would tend to do well against a broader range of strategies, avoiding overfitting.


Performance Modifications

In order for us to effectively pursue our genetic programming experiments, we had to speed up the execution of the package.

One of the avenues we pursued in our investigations is an attempt to speed up the execution of the Othello GP package. We attempted several things along those lines. First, we noticed a glaring "// Very inefficient bubble sort" in the OthelloPlayer.java file. We valiantly proceeded to replace it with an ultra-efficient quicksort implementation. Unfortunately, this produced a very slight increase in performance.

Next, we noticed that training against Edgar takes significantly longer than training against simpler players (i.e. Random). After careful observation we discovered that the Edgar class is being re-instantiated every time it is used against a particular individual. Modifying the code to avoid this we achieved a great increase in speed - almost 6-fold - as the experiment shows:

Population 200 (training 5 games against Edgar)
w/o modification - 350 seconds
with modification - 60 seconds

This allows us to run more tests against Edgar significantly more efficiently, and generally makes it much less annoying to do so.


Evaluation Modifications

For our evaluations we used three main opposing players - the Random player (which makes random moves), the Edgar player, and a Greedy player. The Greedy player simply makes the move that gives it the most pieces of its color (i.e. if it's playing white then the function tree for it is simply "white").


Variations on Learning Method

The primary problem with the genetic programming learning method is the determination of the fitness measure for an individual. There is no way to determine an "absolute" fitness for an individual - this can only be determined in relation to a particular player. In our experiments we noticed that the genetic algorithm tends to overlearn to the player against which it was trained quite often.

Therefore, we needed to devise a method of training that would allow us to produce Othello players that would generalize their play to remain competitive against various players. The ideal approach, of course, is to allow our genetic algorithm to train against a competent human opponent that varies its play, so that the genetic algorithm can learn to play diversely. The next best thing is training versus several different automated players that use various strategies in an attempt to make the resulting individuals come up with a generalized strategy that plays well against all of them.

The goal of this training is to try to produce individuals that would be able to play competitively against the testing players that we used. Thus, if an individual evolves that can consistently beat the various kinds of players, then we can assume that the strategy that it has learned is a generally good Othello strategy that is not overfitted to any particular opponent.


Multi-Training

After several preliminary tests, we settled on the following setup. Each of the individuals in the population will play several games, which will then be combined into the overall fitness. A logical extension of the original 5-game model would be to simply split the games up between the three players that we have - 2 games each against Random, Greedy (described above) and Edgar. However, because Greedy and Edgar are deterministic, and Random is *very* random, we settled on the following:
1 game against Edgar
1 game against Greedy
4 game against Random

After testing the players evolved from this multi-training we saw that they still did not have a general strategy. Our primary goal was to make it win against all three players, and after these test, the players were able to beat at most 2 out of three.


Modified Standard Fitness - Variance

After thinking about this problem for some time we figured that this bad performance is a result of a flawed standard fitness evaluation. A player with a pretty "good" standard fitness could play extremely well against Edgar but badly against Greedy. This would result in a good overall player fitness, but in reality this player would not have a good general strategy.

Our key to solving this problem was to penalize players that have this disability of overspecializing to a particular player. After taking the raw fitness of the 6 games, we computed the variance of the fitness over the games, and used it as a modifier to produce the new standard fitness for each individual. To compute the variance between the three types of games we used the raw fitness from games against Edgar, raw fitness from Greedy, and the average raw fitness from Random. So, the new standard fitness for each individual became

modStdF = stdF + variance*5

This causes the individuals that overspecialize to a particular opponent and do badly against another to have a higher fitness, eliminating them as viable choices for selection.

The top players evolved using this method turned out to be those with low variances (as low as 2) and low unmodified fitness measures (number of opponents pieces remaining). Note: The variance-modified fitness generally turns out significantly higher than the original fitness, because the variance factor tends to change it a lot.


Results

Using the modified standard fitness method, many players were evolved with good modified standard fitness that were able to beat both Edgar and Greedy, and win most of the games against the Random player. This generally indicates that they have learned some aspects of Othello strategy that are independent of the opponents that they were trained against.

Here are some of the best evolved individuals (variance-modified fitness)

Fitness 137.44
Edgar: WIN (43 to 21)
Random: WIN (14 out of 20)
Greedy: WIN (41 to 23)

( * ( - ( + ( / ( / black_edges black ) black ) black ) ( - black_near_corners black )) ( / ( / ( * ( - ( + ( * ( / black 5 ) ( / black white_edges )) black ) ( / ( + black_corners black_corners ) white_corners )) ( * black black_corners )) black ) ( + black_near_corners black )))


Fitness 121.94
Edgar: WIN (40 to 24)
Random: WIN (13 out of 20)
Greedy: WIN (46 to 18)

( - ( / ( * black_corners ( / ( * ( * black_corners white_near_corners ) ( / white_near_corners white )) ( * ( * ( - ( / white_corners white_corners ) ( / white_corners white )) ( + ( * ( * white black_corners ) ( + black white_corners )) ( + white white ))) ( / ( / ( + black_edges black_edges ) ( - black 5 )) black )))) white_near_corners ) ( - white_edges 5 ))


Conclusions

The purpose of our experiments was to attempt to improve the genetic programming methods to evolve general Othello playing strategies that would perform well independent of the details of the training. The assumption that we have used in our experiments is if an individual is evolved that consistently beats several different players that employ different strategies, then the evolved individual has actually learned something non-trivial about the game Othello, rather than just overfit to the players used in training.

The individuals that were evolved using the variance-modified fitness method satisfied these conditions, decisively defeating Edgar, Greedy and Random. Unfortunately, it is very difficult to determine whether the strategies learned are can actually be generalized to any opponent. One simple test was to have the creators (presumably employing intelligent Othello strategies) play against the evolved players, and the creators were defeated soundly. This is not a very precise fitness measure, but it does show that a certain amount of "intelligence" was evolved using GP Othello.