A Framework for ALife-Motivated Learning Experiments


Jean-Denis Greze
Greg Estren
Matt Demanett

Spring 1999, AIS, Eric Seagull

Introduction

The motivation for our project was to develop a general environment where we could plug in different learning algorithms and domains to test them out in. Often, when dealing with learning algorithms, Computer Scientists are dealing with abstract domains and representations. It is often needed to write visual interfaces that give good representations of these domains. Instead of having to write one representation for every problem that must be solved, we have aimed to provided a general environment, that can easilly be adapted to specific problems The goal is to provide a general enough infrastructure (using setup files, sample classes, etc.) to make it easy for people to implement their different kinds of learning algorithms. In particular, our simulation provides a environment for autonomous learning experiments.

We will then use this environment to solve some interesting problems that deal with A-Life (the concept of self-reproducing agents).

Problem

The problem consists of first building a general environment driver that facilitates adding of agents and actions, allows for logging of actions, communication between agents, and the implementation of different learning algorithms.

The second part of the project involves the user of this environment to solve more specific systems, as well as providing a visual testing ground for different AI algorithms.

Approach

Our approach consists of two main areas: the design of a world-driver and the use of this driver to apply specific AI learning algorithms so specific Artificial Life domains.

World-Driver

The idea was to build a completely abstract interface that would run a world of agents and actions. The world is represented as a 2-dimensional grid. The agents must extend an AbstractAgent class that outlines all of the basic characteristics that agents must have, and forces anyone using our driver to add their own methods to make a "plan" (the list of moves for that turn). A similar mechanism is used for Actions, whoever uses our driver must extend our basic Action class and specialize it. Our Driver provides a logging system for the actions of every agent everyturn, it also provides a GUI that represents the Grid and its agents moving. The Driver is fairly general, one may plug in any AI algorithm, any type of Agent or Action. We even have a 1 way-infinite tape (PositionLess Agents) which would let us encode a Turing Machine in our Driver.

In attempting to build a general engine for Artificial Life domains, we identified two main problem areas: updating the World and the resolving of conflicts between different agents.

Updating was a problem because computers do not have the ability to do more than one thing at once. Rather, through multitasking, they can give the semblance of doing many things at once, while they are really alternating from one task to another really quickly, actively fooling the human mind. This affected our world because we needed to find a way for all of the World’s agents’ actions to happen at the same time. Much like in the real world where two things can happen at the same time, we had to figure out a way to make all actions for one turn "seem" instantaneous and synchronized. We got around this by having two main update rounds. The first one builds up "plans" of what each agent wants to do, while the second round actually enacts the changes. By doing some nice tweaking with the different Actions of different agents, we were thus able to make our world seem instantaneous, once all the moves were calculated, they would happen at once.

The biggest problem in designing the world was to come up with a consistent system to resolve conflicts. Conflicts are the result of two Agents doing two actions that are not compatible either to themselves, or to a same third-party agent. If you imagine a world made up of Rocks (that can be picked up) and RockPicker agents, you can imagine the problem when two RockPickers that aren’t in the same grid space attempt to pick up the same Rock. The question is: who picks up the Rock. Another problem with resolving conflicts is that before one may even think about resolving the conflicts, the conflicts have to be identified. Early on, we thought that picking a random agent (from the list of conflicting agents) and letting it do its action while stopping all of the others’ was a good solution. After some thinking, though, we decided that this idea might not be consistent with the agents’ learning. How were they to understand this concept that randomly, when they try to do X or Y, the action isn’t completed?

Our three main problems in regards to conflicts were then: finding conflicts, resolving conflicts, doing both in a consistent manner that the different agents’ learning algorithms could represent. After some thinking, we decided to leave this up to whomever was using our engine. Conflicts are very dependent on the different agents and their actions (the domain as a whole), hence we decided to force (using Abstract Classes) whoever was using our program to implement their own kind of conflict resolving mechanism. We let them chose whether or not the conflict resolution was at the level of the World-Driver or if it was at the level of the agents themselves. We did this by leaving hooks (methods …) that may (or may not if there are no conflicts to resolve) be expanded for particular domains.

AI Algorithms and Artificial Life Domains

The goal of this section was to use our engine to implement some AI Algorithms in Artificial Life Domains. After reading about Artificial Life, however, we found out that were doing something a bit closer to building domain-specific agents than true artificial life.

Our first world was very general. It involved BigHills (think anthill), DoomAnts (think ant w/ a brain), WeirdFood (think food) and FoodPlants (think trees that make food when there isn’t enough around). The principle was pretty simple, DoomAnts roam around and get food. They act, however, according to a genemap (29 bits) of 0s and 1s. The genemap can tell them anything from: you should pick food whenever you can to you should eat food whenever you can. The DoomAnt’s basic actions are those of picking up food, eating food, moving or attacking. DoomAnts’ genes also regulate how much food they may carry along with their general behavior type. The BigHills represent the genetic algorithm that makes DoomAnts. After a certain amount of turns (depends on a DoomAnt’s genes), DoomAnts go back to their own BigHill. Once there, the BigHill evaluates that DoomAnt’s ability by seeing how much health it has, and how much food it brought back. Depending on these two, it ranks that DoomAnt’s genes. At this point, a genetic algorithm was used to find out what the new DoomAnt should be (when a DoomAnt comes back home, it gets killed). We used a genetic algorithm close to that detailed on pages 619-621 in Russel and Norvig. We would find the best 4 DoomAnts currently active, we would cross their genes over at a random point, and then for 3 random mutations and put that new DoomAnt in the world. Here, we were influenced a bit by John Koza (from Stanford University, see Jean-Denis Greze’s presentation, he also has a book on Genetic Programming), and the general concepts of Genetic Programming/Algorithms that we have from differnet AI courses and textbooks that we have read over time. This world could be run with different "factions" of Ants; these factions could fight if their genes dictated so.

Our second world was much simpler than the first, but it still used genetic algorithms (though a different one). The domain is fairly simple, you have agents that move around according to their genes. They also have knowledge about the position of food in the world. Overtime (and movement), these agents lose life, and when they have no life left, they simply die. Finally, the agents’ genes represent little bits of a program that tell the agent where to move in relation to its knowledge of the world (in this case only food). What would happen is that Agent would simply roam around. Most would die, but when any agent went to the same square as the food, its gene-pool would become more significant in the construction of the next generation of agents. The goal was that we would be able to build agents that would find that the line is the smallest path between two points (in this case their starting point and the food).

Our final problem was a mix (in complexity) between our first and second world. The domain was similar to the first: Ants roaming around trying to get to food. The difference is that ants could only move and eat, no attack or pickup were encoded. The learning was itself completely different. Instead of using a Genetic Algorithm approach to learning, we decided to use Decision Trees. The Algorithm used was the one detailed on pages 531-540 of Russel and Norwig. Now, instead of simply going around depending on one’s genome, the agents would accumulate knowledge about the world via random experiments. They would then build DTs that represented those decisions and states that led to eating of food (which is considered the goal). The hope was that as time goes on, the agent ant would amass more good examples, thus to learn better DTs, and perform better in the domain (gain more food per given number of moves).

Results

The first question that needs to be answered is whether or not the problems that we tested in our World-Driver were indeed Artificial Life. After reading John Koza’s paper, we may say that no, the problems that we tested were not trully Artificial Life. They were in the sense that we had self-reproducing agents, but weren’t in the sense that we did not let these agents learn self-reproduction by themselves. Our project might be better defined as domain-specific-agent-learning. Apart from this, we believe that our primary goal, to make a general Driver for Artifical Life problems, was a success. We were able to encode three different learning strategies, as well as handeling conflict resolution succesfully. Finally, it is clear that our infrastructure could be used to do such experiments as those detailed by Koza. In reality, we have made a World-Driver that can do Artificial Life experiments, and a whole lot more.

The first experiment went fairly well. The Ants learned, but they would cycle through the different gene possibilities. In a way, it seems that none of the different gene combinations were very different. The only consistent point is that the Ants would pickup the food when they saw it. The explaination that we came up with is that the domain was not balanced. This means that the actions (Attack, Move, Pickup and Eat) and their results were not made in a stable way. This resulted in there not being a best answer to the problem, there might not have been such a clear-cut winner in regards to genes. The results of this experiment greatly changed the approach that we took for the next two. We decided to restrict ourselves to simple domains in order to be able to monitor the learning of the individuals.

Our second experiemnt, in its very restricted environment, was quite a success. After 20-30 generations, the ants learned to consistently go near the food, at fairly good rates of success. Originally, however, it did take a little while for the first ant to find the food, but once it did and the genetic algorithm kicked in, we had a whole colony of food seeking ants going around.

The third experiment came out a bit odd. Remember, the different here is that the ants did not have perfect knowledge about their world. Rather, they would amass knowledge based on their state and thusly would build up a decision Tree of what to do in such and such situation. In doing so, however, we encountered two big problems. First, the positions of the foods don’t follow a pattern. Decision Trees, however, are discrete and tend to learn discrete combinations of moves as having good/bad values. The problem is that even though one ant might learn a succesful way to get food in one situation, this behavior (DT) might not adapt too well to another situation. In most cases, we noticed that the knowledge did not apply well at all to other situations.

Conclusions

First we have realized how important it is to build a stable domain when trying to apply Artificial Intelligence Algorithms. Especially when we are developing arbitrary testing domains (such as the Ant/Food world), it is important to analyze all the parameters before hand. Otherwise, a lot of "tweaking" time is needed to get the algorithms to work correctly.

Secondly, we identified a certain problem with Genetic Algorithms. Unlike NN and many other AI algorithms, Genetic Algorithms are not guaranteed to settle in a local or absolute maximum/minimum. They can go on forever without finding a solution. This either means that there isn't a best solution for a given domain, or that the random genetic algorithms offset the individuals so much that a "best" breed is never found. In regards to this, it would have been interesting to develop a log of ALL the genes in order to see which ones occurred more than others.

Thirdly, we identified a problem with Decision Trees and large spaces. The problem is that Decision Trees tend to specialize fairly quickly. When you are dealing with large space, DTs aren't able to specialize anymore. Not all situations are encountered, and so it might take much time for significant knowledge to be acquired. Finally, DTs don't seem to work well in highly non-organized environments. The learning for individuals seemed to be skewed towards specific situations, not general environments.

This project went a long way towards telling us that half of the successes with AI start with failures. In a way, by coding a DT algorithm, and two genetic algorithms, and trying out 5 environment (though only 3 worked and were submitted), we learning a lot about the different environments, algorithms and their implementations. If we were to do the whole thing again, we would probably use a different approach in developing the test environments and their respective AI algorithms.

Future

In the future, we would use incremental developing of the test environments. Slowly adding features and making sure that the current set of features is stable. This would allow us to monitor the learning as the world got more complex, and would allow us to identify failures in learning in an easier manner.

We would like to improve our general World-Driver. We still need to implement pictures for the different agents (right now, they must be geometrical shapes, you can't plug in your favorite jpeg or gif).

Finally, we would like to use our environment's communication feature, which allows agents to share knowledge by communicating classes etc. This was originally going to be part of this project, but after reading up on communication, we realized that it invovled more resources than we had available. Perhaps we will continue with this in our next AI course, of just for fun this summer.