Machine Learning - Genetic Algorithm On Game Othello

GPOTHELLO
MACHINE LEARNING TEAM
Wei-Ang Lee | Naho Ogasawara | George Pen Kang Yi

	Our Experimentation and Analysis
	Our Algorithm versus EDGAR
	Our Algorithm versus Random Player
	Our Data and Code Files

Hypothesis and Motivation

How would we use genetic programming to have the computer learn to play Othello well? Neural Networks, Decision Trees, and Candidate Elimination gear more in the lines of "molding" a hypothesis(es) from a hypothesis space, but genetic algorithms take a different approach and require different thinking. We decided to have the computer learn from an already smart computer program, Edgar. The motivation here is you have to learn from someone smart to get smart, so learn from Edgar and evolve, mate, have children, do whatever it takes to beat Edgar. We believe that if our algorithm can beat Edgar or at least acquire a decent fitness from Edgar, our learned algorithm could most likely beat out the random players. So, Edgar will be one initial learning condition.

Secondly, we'd like to create a scenario that would allow generating function trees faster (or dudes as George calls them) but produce positive results. Our hypothesis here is to use a smaller population but have more generations. We're trying to mimic the concept of evolution, by saying that the longer the populatoin exists (in terms of generations), the smarter the individual dudes will be. Along with that rational, we wanted to allow for more awkward growth, because just having generations and generations go by without having any random phenomenon happen would leave the population evolving perhaps in a pattern-like state, mating and conducting crossovers in the same fashion everytime. Rather than having this, why not throw a wrench in the system from time to time? In other words, we wanted to see how the populatoin would grow, and by allowing for mutations to occur about less than half the time, the population becomes much more interesting and diverse.

A third approach we decided to adopt is the use of demetic grouping, messing around with populations separated from one another. With only one large population, dudes that are less fit will mostly likely not have a chance to mate, and they get lost in evolution. We wanted to save those dudes from extinction, by separating the population into five groups, each group having their own relatively fit dude. One of the dudes might not be fit for its current generation, but it might have the potential to become something better after crossover. For example, say we have two populations, and two relatively fit individuals (see left).

Dude A has potential to become better, and maybe by migrating him to the right population, he would produce lower fitness offspring (lower the better according to the GP program). We wanted to see what would happen if that occured 90% of the time. We believe that more migration will unlock "potential" for them (kind of like people migrating to the United States).

To increase the number of our experimental runs, we decided to make the two above environments described (small pop, more gen, some mutation and demetic grouping, high migration) more situational, utilizing the three different selection types offered by the GPOthello.java program - Probability, Tournament, and Greedy Selection. This would give us a total of six distinct situational runs, enough to conduct a proper analysis on which had a better affect at achieving the smarts to beat Edgar. For those who don't know about the three selection types:

Probability Selection takes the best fit individuals to do what you would like with them.
Tournament involves selecting a random number of individuals to be put into an arena (array) and the best fit are chosen from that set.
And Greedy Selection involves selecting dudes from the fit portion of the population 80% of the time and from the unfit 20% of the time.

Approach and Modifications

For some initial modifications, we decided to change the random number generator for the OthelloRandomPlayer algorithm. After inserting print statements, we saw that Random.nextDouble() did not do what it was supposed to, so we replaced it with the equivalent Java code, "Math.random();" to compensate. We changed the OthelloCache.java code so that the GPPlayer would be BLACK while Edgar or the RandomPlayer would play WHITE. This way we wouldn't have to constantly switch between the two colors for the GPPlayer.

Our approach to satisfy our initial motive was to edit the GPOthello.ini file to modify the experiment for our specific environments:

SMALL POPULATION, MORE GENERATIONS, MUTATION (nicknamed Tribe):

Population Size = 100	The smaller population used for faster generation and evolution
Number of Generations = 100	The more evolutions the merrier. And the smarter the dudes become.
SwapMutation Prob = 40	Don't give them a steady evolution; throw a wrench in the system. Have them adapt.

DEMETIC GROUPING, HIGH MIGRATION (nicknamed Immigrants):

Population Size = 200	Larger population to be divided up.
Number of Generations = 100	The more evolutions the merrier. And the smarter the dudes become.
DemeticGrouping = true	Separation, to induce "distinct" living standards in each group.
DemeSize = 40	Create 5 groups.
DemeticMigProb = 90	One or two dudes move to another populatoin and filter in with the town.

For the above settings, there are 3 runs of each, each with a different selection type value - 0,1, or 2. They are the selection types I described above. Along with these settings, all 6 have these settings: number of games = 10; termination fitness = 1; good runs = 5.

Results and Analysis

We collected a few interesting results, but these results were not what we hoped for. This motivated us to produce better results, and we continued our experiment later with an altered motivation (will be discussed below). We tested our function trees against Edgar, playing one game with that program, and also tested it against random players, playing 50 games with those. We generated about 35 total trees. The greedy selection Immigrants and Tribe run consist of the bulk of the trees, because it ran the fastest at the Mudd NT Lab. The tournament selection Immigrant and Tribe runs were done on CUNIX, while the Probability Selection IM and TR runs were also done in the NT labs. Since CUNIX was mad slow, not too many trees came out of there.

Analysis of our results show that the dudes that pitted badly against Edgar did relatively well against Random Players, which was half the effect we wanted to achieve. But we also wanted to beat Edgar as well. There is ONE dude that beat Edgar, but he barely beat him with a score of white/black - 31/33. He won by only two extra pieces. But as you see below (highlighted in blue), the program only won 7 out of 50 games against random players and this is consistently. What we learned here was that doing well against Edgar doesn't necessarily mean that we will do well against random players, although that might be perhaps intuitive, since we wouldn't be doing this if we could generate good othello players randomly. One reason we came up with for this discrepancy was that we overlearned to Edgar's strategy, leaving no room for any other type. By playing random players with a strategy meant to beat Edgar, we left our Edgar-creaming function tree unprepared to face randomness (these cases are pointed out below in the tables in YELLOW).

Some other things we found interesting were that there wasn't much of a difference between choosing different selection types. Regardless of how the population chosen, things turned out very similar for probability, greedy, and tournament selection. With the Immigrants (Demetic grouped individuals) we saw that the trees were smaller than the ones with the smaller population. One thing positive was that we were able to see fitness improve as the generations evolved. Please check out the evolution of the only dude that beat Edgar.

IM = Immigrants , TR = Tribe

Mthd	Functions Generated During Greedy Selection	Score Against Edgar (W/B)	Wins Against Random (50 total games)
IM	/ white_corners * black_edges white_edges	48/0	29
	black	49/15	32
	/ - black_edges black white_edges	30/0	21
	/ black white_edges	61/3	36
	/ black_edges black	56/7	29
	- black_edges / black white_edges	30/0	28
TR	/ black_edges black	56/7	29
	+ - black_corners + - white_near_corners black * black_edges + white_near_corners black_corners white	48/0	11
	* white_edges * - white_corners black white	52/0	38
	+ * + white_near_corners black_near_corners + white_corners black_edges + - white_corners black_edges - white white_edges	40/24	7
	+ white_edges * / 10 white_edges black	62/2	35
	/ black_near_corners * white_corners black_edges	45/19	16
	/ - / black white_edges white_corners white_edges	62/2	35
	/ 10 * black_edges white_corners	48/0	27
	/ black white_edges	61/3	35
	/ / * black black_near_edges * white_corners white_edges * / black white_edges black_near_corners	46/18	21
	/ 10 * white_corners black_edges	48/0	29
	* black * black_edges - white_corners * white_near_corners white corners	36/0	28
	- / black_edges white_edges black_edges	48/0	23

Mthd	Functions Generated During Tournament Selection	Score Against Edgar (W/B)	Wins Against Random (50 total games)
IM	/ white_corners / white black_edges	58/5	31


TR	/ black white_edges	61/3	35
	- black_edges / black white_edges	30/0	29
	/ black_edges black	57/7	33
	/ white_near_corners - white_near_corners white	57/7	31

Mthd	Functions Generated During Probability Selection	Score Against Edgar (W/B)	Wins Against Random (50 total games)
IM	- black_edges / black white_edges	30/0	31
	/ black white_edges	61/3	31
	+ black_edges white	56/8	32
TR	- / black_near_corners + white_near_corners white_edges black_edges	31/33	7
	- - black_corners black_edges / 10 – white_near_corners white_corners	48/0	23
	/ white_edges * white_edges black_edges	48/0	9
	- / black_near_corners black_edges * black_edges white_near_corners	37/27	15
	* * * white * * white_near_corners black_edges / white_corners black_edges / white_corners black_edges white_edges	52/0	30
	* - white_edges black – white_near_corners black_edges	42/0	33
	+ * + / + black white_edges white_edges 10 10 white_edges	62/2	36

GO BACK TO WHERE YOU LEFT OFF

Taken from the datafile, prob.tribe.stc

Run number 3 (good runs 0)
Too complex 0
Duplicate 0
Gen| Fitness | Complexity | Depth |Variety
| Best Average Worst | B A W | B A W |
0| 85.00 422.28 371.00| 15 15.0 31| 4 3.5 5| 1.000
1| 75.00 304.62 393.00| 5 14.6 3| 3 3.6 2| 0.950
2| 93.00 407.60 329.00| 23 12.7 19| 5 3.7 5| 0.940
3| 79.00 386.26 349.00| 9 8.0 9| 5 3.4 4| 0.930
...
77| 9.00 409.90 405.00| 9 13.8 25| 5 7.4 13| 0.950
78| 9.00 412.98 395.00| 9 13.7 15| 5 7.3 8| 0.960
79| 7.00 421.72 397.00| 7 13.5 7| 4 7.2 4| 0.940
...
99| 7.00 359.80 391.00| 7 10.7 11| 4 5.8 6| 0.830
100| 7.00 362.76 449.00| 7 10.6 19| 4 5.8 10| 0.810
========================================================================
Generation:100 best:2 worst:76
GP# dad mum oper mut shrk mut fitness len dep
===== ============= ============= ======== ======== ========== ==== ====
B 2 95 7.00 7 4
RPB: ( - ( / black_near_corners ( + white_near_corners white_edges )) black_edges }

GO BACK TO WHERE YOU LEFT OFF

Post Experiment Success Story - Modified Hypothesis

Here is probably where we made an important move towards trying to beat Edgar and at the same time beat Random Players. Thinking back on overlearning for Edgar, the tree right above doesn't really learn Edgar too well, since the score in the end is about even (31/33). This tree is still the smartest one out of the bunch, and we began to think of other possible ways to improve this particular tree. Our new hypothesis was since this tree already learned so much, why couldn't it possibly be a structure for something that CAN work for both random players and Edgar. Here, we decided to keep the tree structure constant, and randomly choose either 1) the leaves/primitives or 2) randomly choose the operators. Our rational for this is that the tree has already provided enough information to do damage to Edgar, but it perhaps needs a little fine tuning. The hypothesis space has already been trimmed down, and now we just need to refine it, and that's where either the leaves are altered, or the internal nodes are altered.

Here is the tree above in prefix format: - / black_near_corners + white_near_corners white_edges black_edges. In other words:

operator operator primitive operative primitive primitive primitive (3 ops, 4 prims)

First we decided to fix the operators to their preset value, randomizing the primitives. Then we tried the opposite. For the former and latter, we ran a perlscript that George wrote that would generate semi-random function trees that were then put to the test on Edgar. WE HAD GREAT RESULTS, and we came up with one final tree that BEAT EDGAR and BEAT RANDOM PLAYERS. I welcome you to take a look at our algorithm vs. edgar and our algorithm vs. a random player. There were some problems with the applet aspect of it, and for some reason it didn't load in Netscape. The results are below:

RP = Random Primitives, RO = Random Operators

Mthd	Functions Generated During Probability Selection	Score Against Edgar (W/B)	Wins Against Random (50 total games)
RP	- / white_near_corners + back_near_corners black white_corners	27/37	11
	- / white_edges + white_edges 10 black_edges	32/32	14
	- / white_edges + white_edges white_edges black_edges	32/32	6
	- / 10 + 10 black_near_corners white_edges	23/41	41
	- / white_edges + black_edges 10 white_edges	31/33	39
	- / white + white_edges black_near_corners black_edges	23/41	28
RO	+ * black_near_corners - white_near_corners white_edges black_edges	29/34	37
	* / black_near_corners - white_near_corners white_edges black_edges	32/32	29
	- - black_near_corners +white_near_corners white_edges black_edges	27/37	18

I welcome you to try these trees. The best two trees above in blue are the ones that do highly well against Edgar and highly well against random players. Loosing to one tree above by a score of 23/41 pretty much guarantees a win over Edgar anyday, and on average and consistently, this tree could win about 41 times out of 50! We ran it a few more times to make sure, but that was the final verdict. I end this experiment with these set of very good modifications to the original tree that beat Edgar. You are welcome to look at the data that we have collected.

BACK TO TOP