ML HW#4: The Effects of MiniMax on Genetic Programming Algorithms for Othello

Jeffrey Leong
Jason Greenberg
Philip Kim


[Overview description | Implementation | Results Discussion | Results Listing | Shortcomings | Individual Results List]

Overview Description

For our Genetic Programming Project, we decided to study the effects of using the MiniMax algorithm as an alternative to the current evaluation function used to choose the GP player’s next move. The motivation behind this decision is simple: currently, an evolved GPOthelloPlayer chooses the best "next move" based on the current board alone. That is, the GPOthelloPlayer can only choose the best move to make next based on what will yield the maximum immediate score. However, especially in a game like Othello, it is possible that move may turn out to be a terrible move two moves down the road. For example, if the GP player chose to put its piece in a square adjacent to a corner piece, that might result in a high immediate score; however, it’s fairly apparent that the opponent will choose the corner piece on the following move, making the GP move out to be a bad move. This situation would be rectified if the GP Player could see several moves into the future, and therefore evaluate which move would be better in the long term.

After the first few trials, it became apparent that letting MiniMax run with an unbounded or fairly high depth was taking too long; our first few trials took well over 20 hours, and failed to finish Therefore, as an optimization, Alpha-Beta pruning was implemented to speed up the evaluation of the moves. Alpha-Beta pruning simply removes branches that lead to paths that would never be taken.

Implementation

OthelloMiniMaxPlayer extends the class OthelloGPPlayer by overriding the method evalMove(). In the implementation, eval move calls the recursive methods minNode() and maxNode(), which use the minimax algorithm to return an evaluation score for the move passed to evalMove(). The minimax algorithm uses OthelloGPPPlayer’s evaluate() method to find the finess of each leaf node on the minimax tree, passing it a GPPlayer string in the constructor. If either maxNode or minNode detect an end of game, evalMove will return a score of Infinity and -Infinity for a winning configuration and losing configuration respectively in relation to the specified color.

Alpha-beta pruning enhances the efficiency of the miniMax algorithm. Passing low and high range variables to descendant nodes and updating these variables as a depth first search returns particular evaluation variables accomplish this. As a min node searches each of its descendents, it continues its search only if the backed up value for a descendent is greater than the alpha value. Similarly, max nodes continue the search of their descendants if the backed up value is less than the beta value. If a descendant’s value short circuits further query of other descendants, it is backed up to the parent. This eliminates unnecessary searching because the parent will discard the short-circuited value anyhow. The improvements in efficiency are tremendous and readily observable.

Initially, during the both the evolution process and the actual game playing, a MiniMax tree of a given depth/threshold is created. That is, for each move the opponent makes, a corresponding tree to depth n is created. While it would have been more practical to develop a single tree in the beginning and then continue to refer to this tree, our method did not actually use a tree structure. Perhaps the other method would have taken more time to code and execute, but in the long run might have saved system resources.

Results

(note, all results can be found in the attached results files)

Four (4) separate experiments were conducted: our MiniMax players played against Edgar, our MiniMax players played against the standard GP players, our MiniMax player played against 50 Random Players, and then we played the standard GP Players against 50 Random Players.

Seeing as how the MiniMax algorithm is an improvement upon the standard genetic programming player, we should expect that if the MiniMax algorithm were actually useful, it would be able to both beat the matching GP Player evolutions, as well as beat out an equal or greater number of random players than the GP Player. It turns out that all 3 evolutions of our MiniMax players do, in fact, beat out the corresponding GP players. Since the GP player is deterministic, each time a player is pitted against the GP player, the results will be the same, as it plays the same way each time; therefore, simply beating the player is enough, and there is no need to run repeat trials against it.

The scores for the 3 evolved GP players against the 50 Random Players were 0, 41, and 48; the scores for the 3 MiniMax evolved players were 4, 42, and 50 against the set of Random players, thus yielding better results than the same GP players without MiniMax. Therefore, since our MiniMax players beat out both the corresponding evolved GP players, as well as scored higher against the random players, it can be seen that the MiniMax algorithm does, in fact, enhance the abilities of the GP players in Othello.

As for performance against Edgar, since Edgar is also a deterministic player, that again means that running multiple trials against Edgar will produce the same repeated results; as it turns out, only one of our MiniMax plors was able to beat Edgar. However, since most human players are unable to beat Edgar, this is not an entirely unexpected result.

List of Results

The results of our project for the various trials are as follows: Please note that the individual results listing are named as follows:
results_[player name]vs.[player name]_[machine it was run on]
HelsinkiMiniMax player depth 1, NO alpha beta
AthensMinimax player depth 1
ParisMiniMax player depth 2
MiniMax Player vs. Edgar Edgar 4, MiniMax Player 1
MiniMax Player vs. GP Player MiniMax Player 3, GP Player 0
MiniMax Player vs. Random Players MiniMax Player 42, Random Players 8
Random Players 42, MiniMax Player 4
MiniMax Player 50, Random Players 0
GP Player vs. Random Players GP Player 41, Random Players 9
Random Players 50, GP Player 0
GP Player 48, Random Players 2

Shortcomings/Bugs

As it turns out, running our program with anything greater than a depth of 3 takes a phenomenal amount of time. This is because for each possible next move, the program generates a series of possible resulting boards, calculates values, and then sends those boards on to the next move, etc. Therefore, since we did not simply make one large tree, the program is somewhat limited in that way. Alpha-Beta pruning was implemented to increase the maximum depth, in part.

In theory, utilizing minimax to a deeper depth would better the performance of any player. However, we found that when the depth was applied to a few random evolved players, this was not the case. In fact, the players seemed to do worse. We did not investigate this much further, but we suspect that the move evaluation method and the evalFitness calculation are not optimized for looking into future moves with minimax. This would be an interesting topic to pursue in further investigations.