Genetic Programming and Othello

Thomas Bellin (twb6) and Greg Gasperin (gsg13)

1. Abstract

The intent of our addition to the genetic programming othello player was to allow the programs evolved to put an arbitrary weight on the features it was evaluating. The motivation behind this is that features do not necessarily play an equal role in the eventual determination of the next best move. This is in essence the assumption made by the evolved programs in the initial program. By adding weights to the features being accounted for, the contribution of each feature could be scaled by the tree, giving it more expressive power.

2. Implementation

The difficulty in this task lay mainly in the representation of weights for the features. Initial ideas included having a separate tree of weights to apply to the main tree, or to maintain an array of weights to associate with each feature. There are several problems with these two representations. First, the association between weights and features is ambiguous. How are we to apply the weights evolved to the features in our tree? Second, once a method for applying weights was determined, the problem of each feature being assigned the same weight throughout the tree still remained. In certain situations it might be beneficial to apply one weight to a feature appearing in one part of the tree and another weight to that same feature appearing at another location in the tree. Not allowing this places unwanted restrictions on the representational power of our program.

The final solution to this problem was twofold. Firstly, the weights of the features are primarily determined by the tree structure itself. This is accomplished through the use of a binary encoding that is produced as we move down the tree. Moving to the left subtree from one node would result in the concatenation of a “0” to the bit string representing the inverse of the weight, while moving to the right would add a “1”. For example, in the following tree:

Y

X

Z

*

+

If the value of the bit encoding at the “+” node is “010”, then the encoding of node “Z” would be “0101” or 5, while the encodings at node “*” would be “0100” and those at nodes “X” and “Y” would be “01000” and “01001” (with integer values of 8 and 9), respectively. These weights could be used in any way desired, but we applied a weight of the inverse of the integer value represented (1 if the encoding evaluated to zero) to each feature. In the above examples, the weights for “X”, “Y”, and “Z” would be 1/8, 1/9, and 1/5, respectively.

This representation still had limitations, however. The biggest limitation is that two features being operated on by an internal node (+, -, %, *, etc.) could only differ in weight by a small amount. If the weight of the left node was 1/n, then the weight of the right node would necessarily be 1/(n+1). To fix this problem, the idea of no-op encoding nodes was introduced. The no-op encoding nodes, had only one child, and accomplished nothing other than altering the bit encoding. The two no-op nodes we used initially were “B1” and “B0”, calling for an addition of a 1 and 0 to the end of the current bit string, respectively. This could very easily be expanded to include more no ops (e.g., B01, B11, etc.), which we did for the last experiment. The following example illustrates the benefits of using no-op nodes:

Z

+

Given the same initial encoding of “010” at the top “+” node, the new bit values for “X”, “Y”, and “Z” are “010010” (or 19), “01001110” (or 78) and “0101” (still 5). The weights of 1/19, 1/78, and 1/5 vary much more than those assigned without the aid of no-op nodes.

Although the representation just discussed does not accomplish our initial goal of applying arbitrary (i.e. any real-valued) weights to the features, it does provide a very reasonable approximation to this given the constraints of the problem.

3. Experiments

3.1 Pre-Experiment Hypotheses

Going into the experimentation phase of the project, we had several expectations regarding what the resulting trees would look like and how they would behave. One of these was that chains of successive no-op nodes would develop. The reasoning behind this was that, if the weights were being used effectively, many situations would arise where one contributor to a two-parameter operation (+, -, %, *, etc.) would need to have much more influence than the other. The only way this is possible is through the use of multiple no-op nodes, making the one parameter that is an offshoot of these nodes have a higher bit encoding, and thus a much lower weight (the reciprocal of the bit encoding). This reasoning also holds with different operations. For instance, in the addition of two multiplications, the result of one of these multiplications might be of greater importance, leading to the development of a no-op chain above the less important operation.

When using the tree structure as part of the weight (in addition to the no-op nodes), we expected that in general the more important computations would be found at the left-hand side of the tree, and closer to the top. The reasoning behind this was that as you move down the tree, the nodes to the left attach a zero onto the bit encoding while those to the right attach a one. In general, therefore, nodes on the left of the tree have higher weights than nodes at the same level on the right. Nodes higher up in the tree have a higher weight because fewer bits have been added to their encoding. In the above example of a tree using no-op nodes, this was not the case. However, this was due to the nodes on the left being at a higher depth than the one on the right, and because the no-ops were used exclusively on the left side of the tree. We didn’t expect this to happen in practice since we believed that the distribution of no-ops would be more even than this.

3.2 Overview of Different Experiments

The different scenarios we tested are as follows:

· Original features, no weights

· Original features, weights

· New features (based on the paper of Chris Porcelli and Patrick Pan), no weights

· New features, weights

· New features, weights determined solely by no-ops and not position, and using extra no-ops (specifically B00, B01, B10, and B11)

3.3 Results

3.3.1 Original Features, No Weights

Unfortunately, some of the best results we received in our testing were for the original, unmodified setup. The fitness values ranged from 1.00 to 2.40.

3.3.2 Original Features, Weights

For this test, the fitness values ranged from 2.02 to 7.26. This is not terrible, but worse than we expected. We attributed this loss of ability from the original, unweighted programs to the wide range of attribute values. As discussed in the paper of Juno Suk, the number of white and black pieces can be considerably higher than the number of pieces in the corner, the number one away from the corners, etc.. This will result in the generally lower-valued featured being overshadowed by the two higher-valued ones. This is an obstacle for our weighting scheme to overcome, which would have to develop chains of no-op nodes to compensate for the higher value of the number of white pieces and the number of black pieces features. Although chains do develop (as we initially hypothesized), it is unlikely that they would develop for every occurrence of these two features. At this point, we hypothesized that better results would be obtained if weights were applied to features that were initially in the same value range. This is what prompted us to use the features discussed in the paper written by Chris Porcelli and Patrick Pan.

The most encouraging aspect of this run was that it is apparent that the weights are actually being used. There are numerous occurrences of the B0 and B1 no-ops, and even some directly above terminals. This, as discussed earlier, is the only way for the weight of one parameter in a multiplication, divide, etc. to have a significantly different weight from the other parameter of that operation.

3.3.3 New Features, No Weights

Just to have a basis of comparison, we trained a player on the new features, without using our weighting scheme. The no-op nodes still appear in the tree, but remain unused. The results we obtained were not as good as those claimed by the paper we based these changes on (they ranged from 3.83 to 7.60). This could be due to the presence of the no-op nodes, which supply the program with a number of extra, unnecessary crossover points. Perhaps this interfered with the speed of evolution of our programs.

3.3.4 New Features, Weights

The results of this test ranged from 5.45 to 6.98. This is a slight decrease in performance from the unweighted results of the new features. One possible reason for this could be that there are several more new features than there were old features, making the possible variations of them too numerous for the weights to be effective. In order for weights to take effect, certain relationships among the features need to be learned, and there are simply too many possible relationships for this to happen.

3.3.5 New Features, Non-Positional Weights, Extra No-Ops

The results of this test ranged from 7.41 to 9.25, with one wacky tree having a fitness of 69.20. The generally poorer performance of this test in comparison to the previous two shows that perhaps positional weights helped in determining some relationships among features. We had originally expected the removal of positional weights to improve performance, because this would remove an extra level of possible combinations, adding to the difficulty present in determining relational weights given such a high number of features.

4. Conclusions Drawn From Experiment Results

Our experiments using weights weren't as smashing a success as we had hoped. The results garnered from using weight were comparable with those we got without using the weights. Also, we built a tournament generator which played a variety of trees against each other. The results of these tournaments can be seen under RoundRobin Tournament below. In general, the results from these tournaments showed that no one type had any particular dominance over any other.

While we found this frustrating, it allowed for us to try to determine the reason for the failure of the weighted trees to outperform their non-weighted counterparts.

4.1 Over-representation With Weights

It's possible that the original tree representation (without the weights) could represent the same sort of features that the weight represented. Although we considered this, it seems unlikely considering the prevalence of weights in the trees. The GP finds some use in the weights, or else they would be "bred" out of the tree as useless (or even harmful). Still, it's possible that the use of the weights merely "streamlined" or simplified the representation without providing any extra computational power.

4.2 Increasing Hypothesis Space

One of the main drawbacks in adding nodes to the tree representation is that it expands the hypothesis space for the GP. This makes it less likely that it will randomly generate useful individuals and slow the process. However, we felt that the addition of the weights would be beneficial precisely because it expands the hypothesis space. The weights allow the GP to represent complex relations between the features of the board.

4.3 Training Too Simplistic

It's possible that we overestimated the complexity of the game of Othello. Our weighting mechanism, while inspired by the comments of previous students, did not seem to be particularly beneficial. All the types of individuals that we trained (with and without weights of various sizes and complexities, using the new or old representation) were able to generate individuals that could beat Edgar.

The deterministic features of Edgar and the weakness of the random players may not have required the descriptive power provided by the added weights. With more time, we would have liked to apply our round-robin tournament to training new players. However the time required to make these tests and the need to get started on the final project made this impossible.

4.4 New Hypotheses

After studying our experiment results, we came up with new hypotheses on the effectiveness of using weights in genetic programming. Since the results of weighted features for both the new and old feature representations were only slightly worse than the original, unweighted versions, we assumed that there were both good and bad aspects of both of these representations in relation to their usage of weights.

The original representation had unequally contributing feature values, making it necessary for the weights to counterbalance this initial bias and thus making weight-learning slower. On the other hand, the lower number of features was probably beneficial to the weight-learning process. This aspect made for fewer possible combinations, and thus sped up the weight evolution to a degree.

The second feature representation used several more features, allowing for multiple combinations of features and thus slowing down feature evolution to a degree. However, the approximately equal contribution of features in this representation (i.e. no feature has a significantly higher value than another), eliminated initial biases and thus sped up weight evolution.

Based on these hypotheses, it would seem that the ideal feature representation to apply weight learning to would be one in which the number of features was relatively low, and the contribution of these features was approximately equal. Some ways in which this could be done would be to eliminate some of the features in the new representation (for example, the A’s and B’s are probably not very important), or to make the regions represented by each letter larger.

5. The Round-Robin Tournament

After we had generated a multitude of individuals with various attributes it became clear that playing against Edgar and random players was not enough to distinguish our weighted players from their unweighted counterparts. We decided that, rather than discard our previous tests, we should compare these examples directly against each other in a tournament. Our hope was that our weighted individuals would outperform their more "simplistic" counterparts in head-to-head competition.

5.1 Setting It Up

Setting up the competition proved to be a challenge as we had seven different types of trees. Some used the original representation (white, black, white-corners…) and others used the new one (aa,AA,bb,BB…). Some used weights, and some did not. Still other used variations on our basic weighting model.

We set up seven brackets (each representing a different type of individual) with four or five individual per bracket (for a total of 33 individuals) and we had them go at it. Over and over again. Players were disqualified when they failed to win more than 60 percent of their games. The results of the tournament can be found attached.

5.2 Non-deterministic Competition

One of the interesting things about the way the competition is set up is that, although the players are deterministic, the results of the tournament are not. Players play both white and black and their opponents are randomly chosen. Because different players each have a different "approach" there's not really one that is dominant over all of the others and they all have weaknesses. So, depending on a player's "schedule" it may be champion of the tournament or out in the first round.

Although this is true, after a few rounds it's easy to pick out the stars. Consistently some players appear in the last half as well, some appear consistently in the first 5. These are the stars of the GP Othello league. For example, in the three examples we provide player #4, player #9, player #30, and player #31 all consistently perform well. While #1, #19, #22, and #5 all consistently perform badly. The tournament performs much like a sports tournament, the stars rise to the top, but it's possible that an unknown couple win it all.

5.3 Other Tournament Types

The tournament can also be set up to select opponents exclusively from the same bracket as the player or from all other brackets except the player's. The results of some tests using these features have also been included.

The results from playing only players in the same bracket are of particular interest because the winners in those competitions (each bracket has a winner) are not always the "stars" in the original competition. These players may have developed strategies that resist similar players, but that are weak to others.

The results from playing outside of the bracket are similarly interesting. Of particular note is the emergence of player #18 and player #11 who previously had mediocre performance. These clearly are players who are powerful against other types of players, but do not perform as well against their own kind.

5.4 Fitness Ratings

Fitness ratings garnered from playing against Edgar and two random players do not overall match the results generated in the Round-Robin tournament. The best players range from 1.5 to 5.0 and so do the worst players. There was no general trend.

This leads us to believe that playing against Edgar, even with the addition of random players, is not a good training system for Othello players. It certainly produces players that can beat Edgar, but there's no correlation between success at beating Edgar and success in the tournament. Success in the tournament, because it's non-deterministic, relies on learning something about the game of Othello (rather than learning something about a particular opponent).

5.5 Conclusions From Round-Robin Tournament

The results of the Round-Robin Tournament generally enforced the results we had from our original experiments. While we hoped that the results might be different, this is consistent with the hypothesis that the training situation wasn't rigorous enough to produce benefits from weights.

In more difficult and complex situations (e.g. having to compete against randomly selected players) without the deterministic constraint, the weights may have played a stronger role. Also, in a more complex environment, the more simplistic types of players may have greater difficulty. We would have liked to continue our investigation into this effect, but we must move on to work on our final project.