Ilya Mayzus
An important issue in Genetic Programming is the choice of terminals used to create the tree. My initial motivation for this project was to explore how this choice will affect the performance of trees that were used to evaluate the moves in a game Othello. In particular, I wanted to start with terminals that provide the best heuristics and proceed towards more general ones. In the process, I also wanted to analyze the evolved players to see whether their use of terminals matched my assumptions.
As I made progress with creating successful players, I became more interested in maximizing the potential of existing terminals by altering the training. This led me to experiment with different combinations of players used in the fitness function (RandomPlayer and Edgar). I was especially interested in finding a way to influence strategy of the evolved players, while keeping the training time to a modest value. It turned out that changing a fitness function can be at least as effective in evolving players as is changing the input terminals.
As a starting point, I took the observations made by Juno Suk on the limitations of the current terminals, in particular the tendency of white and black terminals to dominate at the end of the game as the number of pieces increases. I decided that an easy way to fix this is to provide the GP with only the number of pieces that will be flipped as a result of the move, rather than total number of pieces that will exist on the resulting board. The downside was that to do this for opponent's pieces requires performing a shallow minimax search to select the best possible gain (for opponent) the opponent can make. I performed a similar substitution with the number of corners, leaving only the number of corners that will be gained by a particular move (either 0 or 1).
I also introduced the terminals reflecting the position of the new move relative to the center of the board as col_from_center and row_from_center (for example, columns of the board are assigned values as 4 3 2 1 1 2 3 4 and similar for rows). In addition, I renamed all provided terminals so that they refer to me/opponent rather than black/white, to avoid restriction of who the current player is being trained on.
Here's a summary of the terminals that I used:
me_change number of extra pieces the player will gain with this move (this means number flipped + 1) me_corners_gain number of corners (0 or 1) that will be gained with this move opponent_change similar to the above; to use these terminals, opponent_corners_gain the OthelloGPTester needs to perform a shallow minimax search to calculate the maximum possible value that the opponent can potentially gain with the next move (choose worst scenario) col_from_center for 8 by 8 board, this gives possible values of 4 3 2 1 1 2 3 4, where the central two fields are 1 row_from_center me_near_corners these terminals were originally provided as opponent_near_corners white_corners, black_corners, me_edges white_edges, black_edges opponent_edges-------------------------------------------------------------------------
1) using 5 (4 variable, 1 static) terminals:
me_change, me_corners_gain, opponent_change, opponent_corners_gain, and 10.
The two opponent's terminals required shallow minimax search and were
computationally expensive. Training was performed by playing 1 game
against Edgar
a) population of 200, ComplexityAffectsFitness off, 10 generations.
b) population of 200, ComplexityAffectsFitness on, 20 generations.
c) polulation of 400, ComplexityAffectsFitness off, 20 generations.
2) using 7 (4 variable, 3 static) terminals:
me_change, me_corners_gain, col_from_center, row_from_center, 0, 5, 10
All opponent's terminals were removed. Training was performed by
playing 10 games against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 50 generations.
3) using 11 (8 variable, 3 static) terminals:
me_change, me_corners_gain, col_from_center, row_from_center,
me_near_conners, opponent_near_corners, me_edges, opponent_edges, 0, 5, 10
Training was performed by playing 10 games against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 50 generations.
4) using 9 terminals, 8 variable, 1 static:
[same as in 3rd, but without 0 and 5]
me_change, me_corners_gain, col_from_center, row_from_center,
me_near_conners, opponent_near_corners, me_edges, opponent_edges, 10
Training performed by playing 1 game against Edgar and 2 against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 19 generations (interrupted run for 50 generations).
5) same terminals as in 4th
Training performed by playing 1 game against Edgar and 9 against RandomPlayer.
Population of 200, ComplexityAffectsFitness off, 50 generations.
1) Not surprisingly, the successful players were trying to maximize
the positive values, while minimizing the opponent's, such as:
( + ( - me_change opponent_change ) ( * me_change me_corners_gain )) ( - ( + me_corners_gain me_change ) opponent_change )
When ComplexityAffectsFitness was turned on, practically all best players
in each generation became:
( - me_change opponent_change ) ( / me_change opponent_corners_gain )
The trees became much more convoluted with bigger population (400) and
ComplexityAffectsFitness turned off. However, the evolved players, while
capable of beating Edgar, were nothing spectacular, considering the time
penalty that was paid to compute the minimax values for opponent. The best
player beat Edgar with a score of 53 to 10, and also did well against
RandomPlayer, winning 449/500.
Of course, this tree had an advantage against both Edgar and RandomPlayer
in that it could look ahead what kind of move the opponent could make.
2) Without information about opponent, but with additional information
about position of the move on the board, the GP still was able to
evolve an effective player (with fitness 90):
( + ( + ( * ( * me_change 10 ) ( + me_corners_gain 10 )) ( + me_corners_gain col_from_center )) ( + ( + me_change col_from_center ) me_corners_gain ))
This player beat Edgar with a score of 29 (35-29), even though it was
trained only against RandomPlayer.
Against RandomPlayer, it had a record of 393/500. For comparison, Edgar's
record is 477/500.
However, no other evolved player approached the same success,
casts some doubts on whether this approach is consistently successful.
3) The results were very similar to those in 2nd step, perhaps
even less glamorous.
The best player (with fitness 70) lost to Edgar with a score of 36 (28-36) and posted a record against RandomPlayer 355/500.
( / ( - ( + ( + me_edges me_change ) ( / me_corners_gain row_from_center )) ( / 0 ( / col_from_center 5 ))) ( * ( - ( + ( / ( * me_change 5 ) ( / opponent_near_corners me_change )) ( + ( * ( * me_edges me_edges ) ( + row_from_center me_change )) ( * ( + 0 me_edges ) ( / opponent_edges me_change )))) ( + ( * ( * ( + me_change 0 ) ( / me_near_corners me_edges )) ( * ( * ( * me_edges opponent_near_corners ) ( - ( - row_from_center col_from_center ) 5 )) ( / opponent_edges row_from_center ))) ( / ( / 0 ( / col_from_center 5 )) ( - opponent_edges me_edges )))) ( / opponent_edges row_from_center )))
One intriguing feature of the best players produced in this run is that
they tend to play very strong at the beginning of the game and often suffocate
the opponent early on (notice proliferation of scores where opponent had a 0).
However, if the opponent survives until the middle,
they lose more easily. Also, the strategy seemed to be geared towards the
center of the board. I would speculate that this is a side effect of
training against RandomPlayers, where the fitness is measured by least
number of pieces allowed. If trained against a player with a distinct
strategy (such as Edgar) or when measuring fitness differently (which
I have not tried), the resulting strategy might be different.
4) With Edgar's score playing a large role in the total fitness,
many best players were also the ones that beat Edgar. Also, many players
were evolved in the early generations that were able to beat Edgar with
a large margin (3, 14, and even 0).
The best player shut down Edgar (37-0) and had a fitness of 16:
( + ( + row_from_center opponent_edges ) ( * ( + me_change me_edges ) ( * me_edges ( + me_edges me_corners_gain ))))
However, this performance against Edgar did not generalize
against RandomPlayer, posting 361/500, which is similar to results
of 3rd step.
5) One surprising result of increasing number of games played against
RandomPlayer in the evaluation function is that players capable of
beating Edgar with a wide margin did not appear until
later generations, with a score of 0 appearing only somewhere 84% into
the run of 50-generations (compare to 19 total generations in step 4).
The player with best fitness among those able to beat Edgar
(with a score of 19; total fitness of 172) posted 423/500 when playing
against RandomPlayer:
( - ( - me_corners_gain ( / ( / 10 me_edges ) ( + 10 opponent_near_corners ))) ( - me_near_corners me_edges ))
The player with absolute best fitness of 126 lost to Edgar with
a score of 38 and posted a similar result of 424/500:
( + ( - ( - me_corners_gain ( / ( / 10 me_edges ) ( + 10 opponent_near_corners ))) ( - me_near_corners me_edges )) me_change )
Interestingly, these trees are very similar, which points to the fact
that the fitness measure was quite noisy even with 10 games.
This lack of correlation between fitness and quality of the player poised some difficulty of finding the really good players. By trial and error, I found one that had a fitness of
183 (which is worse than either of the above two) and was able to narrowly
beat Edgar with a score of 30 (34-30), but it posted an impressive
450/500 when tested against RandomPlayer.
( / me_near_corners ( / ( * me_near_corners ( * ( / me_near_corners me_edges ) me_change )) ( - me_edges ( * ( - row_from_center ( - me_corners_gain me_edges )) ( - 10 ( * me_corners_gain me_change ))))))
In comparison, Edgar's record is 477/500.
Interestingly, this player practically never completely shuts down the RandomPlayer (with a score of 0), but does very well overall. This is in sharp contrast to players developed in step 3. The main difference seems to be inclusion of playing against Edgar in the fitness function.
Run1, a) | .det file | .ini file | .stc file | GP file |   |   |   |
Run1, b) | .det file | .ini file | .stc file | GP file | best player |   |   |
Run1, c) | .det file | .ini file | .stc file | GP file | best player |   |   |
Run2 | .det file | .ini file | '.stc file | GP file | best player | log |   |
Run3 | .det file | .ini file | .stc file | GP file | best player | log |   |
Run4 | .det file | .ini file | .stc file | GP file | best player | log | runs summary |
Run5 | .det file | .ini file | .stc file | GP file | best player | log | runs summary |
# | Run | Player string representation | Player gif representation | Fitness (in the run) | Record against Edgar | Record/log against Random | Watch it play against Edgar | Watch it play against Random | Play against it | |
---|---|---|---|---|---|---|---|---|---|---|
0 |   | Edgar |   |   |   | 477/500 |   |   | [---] | |
1 | 1 | [---] | [---] | 10 | won 53-10 | 449/500 | [---] | [---] | [---] |   |
2 | 2 | [---] | [---] | 90 | won 35-29 | 393/500 | [---] | [---] | [---] |   |
3 | 3 | [---] | [---] | 70 | lost 28-36 | 355/500 | [---] | [---] | [---] |   |
4 | 4 | [---] | [---] | 16 | won 37-0 | 361/500 | [---] | [---] | [---] |   |
5a | 5 | [---] | [---] | 172 | won 45-19 | 423/500 | [---] | [---] | [---] |   |
5b | 5 | [---] | [---] | 126 | lost 26-38 | 424/500 | [---] | [---] | [---] |   |
5c | 5 | [---] | [---] | 182 | won 34-30 | 450/500 | [---] | [---] | [---] |   |
One of the biggest problems encountered during this assignment was inconsistent performance by RandomPlayers. The problem had to do with creating a new Random object in the evalMove() function, with the hope that it will give a different random number because it is by default seeded with the current time. In practice, not enough time elapsed between calls to change the seed and affect the Random sufficiently.
A quick fix was to use Math.random() (OthelloRandomPlayer). Another possibility was to make Random a class variable, instantiate it in the constructor and then use it to get a new value (OthelloConsistentRandomPlayer). The problem with the latter is that when many games are played in succession (such as during testing against RandomPlayer), there was a significant probability that a Random will be constructed with the same seed, so that even though the moves themselves will be random, the actual sequence will be the same. Unfortunately, even Math.random() does not solve the problem sufficiently. For example, in the logs of playing 500 games against RandomPlayer, there are sometimes the same scores in succession. In fact, the experience of watching evolved players play against RandomPlayer may be different from what would be expected based on their performance during batch testing.
Another problem was that OthelloEdgar() needed another constructor when called from inside an applet from Netscape in order to successfully read in FinalWeights. This problem was solved by writing another constructor. Alternative solution was to always use appletviewer.
There is a minor bug in the implementation of tokenizing sequence of original OthelloGPPlayer (it couldn't handle )( in succession). This problem was solved in MyPlayer.
Finally, the average fitness in calculateStatistics in GP/GPPopulation.java
(from original gpjpp package) is calculated incorrectly (it always
increases) because sumFitness variable is not reset to 0. This is easy
enough to fix.
During this project, I proceeded from using strong heuristics, such as looking two moves ahead, to more general but less computationally expensive, such as position of the move. As it turned out, it is possible to compensate the loss of available information with careful selection of fitness measure. The evolved players from the latest stage were able to successfully defeat Edgar and also have respectable performance against Random player.
It appeared that the choice of the fitness function may be more influential to the strategy of evolved player than the initial set of available terminals. When training against a combination of RandomPlayers and Edgar, the evolved players avoided over-fitting against Edgar, at the same time seeme to have learned from him, adopting the strategy of to trying to capture corners and maximize edges. This was not possible when training just against RandomPlayers, since the fitness dictated trying to minimize number of pieces lost to opponent, implicitly leading to over-focusing on opponent's pieces and consequently to the center of the board (since this is where the game starts). When training only against Edgar, there was little correlation between ability to defeat Edgar and ability to perform well against RandomPlayers.
It would be interesting to go a step further and purposefully evolve players with distinctly different strategies and then using the fitness function to evolve players that would be able to combine them.