Leonid Portnoy
lp178@columbia.edu
After running the program several times with the existing primitives, I noticed that it mostly generated a strategy which involved seeing how many edges of its color the board has. The reason is simple : edges are harder to change (flip) to the other color, and therefore if you have an edge of some size, you have an advantage. In fact, the other primitives used, such as black and white corners, are also based on the fact that it's harder (or impossible) to flip pieces of one color in a corner.
This approach illustrates a more general one to evaluating an othello board, which is :
"See how many pieces on the board have a low probability of being flipped further on in the game by the opponent."
Now, since this is an experiment in automatic machine learning and genetic programming, we are not allowed to explicitly define such notions for a problem, since we are trying to be as general as possible. Rather, we want the machine, or the genetic algorithm, to automatically learn this notion of pieces having a low probability of being flipped later on in the game. Therefore, explicitly defining a terminal (primitive) such as "black_corners" or "black_edges", is in a way, cheating. We are using our human knowledge and understanding of the specific problem domain (Othello) to define primitives (black_edges,etc.) which we know in advance are a good factor in the game's strategy. However, in a general problem, we might not know or 'inuitively' observe such factors, and thus will not be able to define such primitives. For those cases the genetic algorithm must learn them on its own.
I wanted to see how the genetic programming algorithm would perform if we were to remove these primitives which come from our knowledge of the problem domain, and see if it would deduce the concepts behind these primitives on its own.
For this reason, I decided to remove "black/white_edges", "black/white_corners", and "black/white_near_coners" primitives. Instead, I added a new single primitive to replace them all :
"next_board's_value"
This primitive would be evaluated by making a random move (corresponding to a random Othello player), and seeing how many pieces of the color we're playing with are left after that move. In other words, we are simulating an opponent's move by randomly making it, and then noting how many pieces of our color survive that move. This 'lookahead' strategy is used in many games (min-max in chess, tic-tac-toe, etc.) and therefore is general enough for us to use, without crossing over the boundary of problem specificity.
The motivation behind this strategy is that eventually the algorithm should be able to learn which board positions can survive the opponent's attack better, and induce such concepts as edges and corners from that (e.g. boards with edges and corners can retain more of its pieces after an opponent's moves than boards without them).
Also, the primitives representing the number of black and white pieces were kept, since they represent the immediate (explicit) measure of success and are thus general enough to apply (with modifications) to other types of problems.
Finally, another very general type of primitive was added : the x and y location of the piece just placed on the board. This primitive represents a possible move that we want to make, and this concept is easily generalizable to other kinds of games and problems. Therefore, it does not cross the 'problem-specific' boundary and cause our implementation to be dependant on this particular problem (Othello). These primitives needed to be added, because otherwise there would be no way to differentiate (in the absense of concepts such as edges and corners) between piece positions within the board.
Of course, building such substantially more complex concepts from simple primitives will take quite a few generations, or perhaps they will not even be discovered at all. It was interesting to see if they would be, and if so how many generations would it take and what would the form of learned concepts be.
Imodified the code to include these new terminals, and I removed the other (old) ones as well, and the algorithm was trained to play with black pieces. The code for handling the next_board's_value primitive was essentially the following, inside GPOthello.java's evaluate() function :
....
case 'Z' /* next_board's_value */
Vector possibleMoves = board.possibleMoves(OthelloBoard.BLACK);
int xx=random.nextInt(); if(xx<0)xx=-xx;
Move m=(Move)possibleMoves.elementAt( xx % (possibleMoves.size()) );
OthelloBoard b=board.copy();
b.makeMove(OthelloBoard.BLACK, m.col(), m.row());
b.updateBoardStatistics();
ret=b.num_white;
break;
The code for handling other primitives was trivial (i.e. just passing x,y coordinates of last move to the evaluate() function and returning them whenever we encountered the appropriate primitives).
Coding went smoothly, as the genetic algorithm package was probably designed with ease of extension in mind. However, there was a slight problem in attempting to save a current run midway in its execution (and all the information associated with it) to a checkpoint file. When using this feature (setting the CheckpointGens variable to >0 in the .ini file), Java would generate an error on attempted loads of the file (when trying to restore a run), which was related to registering classes within Java's environment. This is probably due to a bug in the genetic programming package. Otherwise, no major problems were encountered.
Results from test run 1
The players in this run were evaluated by playing 5 games against a Random player, and there were 31 generations, with a population of 50.
The resulting gif image of the best hypothesis is :
It's fitness is 10.
For this run, I increased the number generations to 41, and population size to 75.
Two samples from the results are:
It's fitness is : 8
It's fitness is : 3
See the conclusion section for a description of this run.
This is the .stc file for run 3
Fitness 0
Fitness 0
Results from run 4
See the conclusions section for a description of this run.
This is the .stc file for run 4
It's fitness is 0 (against Edgar).
The goal of my project was not to produce the best possible Othello player, but rather see how (and whether) the genetic programming algorithm would learn concepts needed to perform better than random in Othello, from scratch (or at least from very primitive and general concepts such as position and move-considered). Also, afterwards we can look at the results and perhaps discover concepts that we did not, or would not, think of by ourselves.
As can be seen from the result of test run 1, the heuristic that player used was "10 / (white+y)". In other words, the value of the board was inversely proportional to the sum of the number of white pieces (opponent's, since it was trained to play black) and the y coordinate. It makes a great deal of sense that the value of a board should be inversely proportional to the number of your opponent's pieces, and so the genetic programming algorithm was successful in that regard. But why should it be inversely proportional to the y coordinate? One possible answer is that this is an attempt by the algorithm to arrive at a concept of edges. The closer we place our piece to the upper edge, the more value the resulting board is going to have, since it will be harder to flip that piece over in a later part of the game. And, of course, a smaller value of y corresponds to a smaller distance to the upper edge of the piece.
After this, I ran a second run, which also showed some interesting results. The first image, shows a heuristic that does essentially the same thing as in the previous example. It's " black - y" and so it favors large number of black (its own) pieces, and small values of y (close to the upper edge). However, the second image, with even better fitness than the first one, shows the heuristic "y/white", which is now proportional to the y value. This seems to contradict the previous examples, but really it is saying the same thing only in regard to the other (bottom) edge.
By this time one thing became very obvious. In almost none of the hypotheses generated by the algorithm was the primitive "next_board's_value" used. What does this mean? Most probably, the number of pieces that survive after a single move by the opponent is not really relevant, and does not matter too much in regard to how the game concludes. The reason might be that the numerous moves following the current one, completely discredit the prediction of this primitive. For example if it says that the value of the board is high because after the opponent's move many pieces still remain, it might be that those pieces are flipped shortly after and this prediction does not help us in winning the game. In fact, this prediction becomes alike a random one, and therefore it was excluded from the best hypotheses produced, which were better than random.
Now, it was very time consuming to evaluate this primitive, which wound up not getting used. Because of this, I decided to remove it for the rest of the experiments, and see how it would do with just the "x","y","white" and "black" primitives (as well as "10").
Then came one crucial realization : The ComplexityAffectsFitness parameter was to true in the .ini file! What this meant was that only small trees were achieving any close to good complexity. But in our case, it was very probable that large trees should get used, because we are combining very general and very simple primitives! For example, to build the concept of an edge or corner it would probably need to combine several layers of combinations of these small primitives. After this, I set that parameter to false and continued running the tests. Now, since evaluation was faster, I tested each hypothesis for 10, instead of 5, games.
The results were great. As can be seen from the images for run 3, the hypotheses are quite complex, which is to be expected for representing complex concepts out of simple primitives, and the fitness of both of them is 0! This means, that they almost never lost (out of 10 games) to the randomized player.
Now, I trained the genetic program against Edgar. Upon conclusion of run 1, I had most individuals in the population getting a fitness of around 200-250, however, some individuals got really impressive results. The best individual is depicted in the image for run 4, in the results section above. The heuristic shown there received 0 fitness when playing against Edgar!
That result seemed interesting, and I decided to double check it by running that heuristic (function) through the OthelloCache program, where it played against Edgar. The result can be seen here : The total annihilation of Edgar
Yes, the name means what it says :) That function won that game against edgar with a score of 47 against Edgar's 0.
I tried to analyze that function, but it seems to require some time to make sense of. In fact, it is really a rather complex combination of very simple primitive functions, and as to what concept this represents one can only guess, and it will probably take some time to figure that out. However, it is very likely that humans would not come up with such an obscure function, at least not without a considerable amount of thought.
However, what we can infer from this result is that it is indeed possible for a genetic programming algorithm to take very primitive functions and terminals, which are general enough to apply to a wide range of problems, and produce a good solution for that problem. The drawback here is that this solution might be very complex (and therefore we have to allow for such complexity - which I didn't for the first two runs), and we might not be able to see the concepts imbedded within that solution right away. But given enough time, it is likely that we will also be able to learn something from that result.
Code
Here's the modified GPOthello.java
This is the modified OthelloGPTester.java
Here's the .ini file used to produce the final (winning) hypothesis