Effect of gpjpp configuration parameters on GP's efficiency


Introduction

gpjpp is a Genetic Programming package used to generate function trees for various Genetic Programming application. The GPOthello package uses gpjpp to generate function trees used to evaluate the state of the board in an Othello game. The fitness of each individual tree is evaluated based on the performance of an Othello player using the tree to guide its movement selection.

The details of the evolutionary process can be controlled by several configuration parameters. The effect of each parameter on the population fitness is explored in an attempt to discover values that optimize the evolutionary process. The parameters examined during the tests are listed below.

Parameters

The description of the configuration parameters examined is copied from the gpjpp documentation.

CreationProbability
Percent probability that a creation operation will occur while evolving a new generation.
CreationType
The strategy used to create generation 0. The possible values are :
GPVARIABLE = 0
GPGROW = 1
GPRAMPEDHALF = 2
GPRAMPEDVARIABLE = 3
GPRAMPEDGROW = 4

     * A tree creation type that is like GPGROW but starts with
     * a tree depth of two and increases the depth by one for each
     * tree until it reaches MaximumDepthForCreation, at which point
     * it cycles back to two. Note that if TestDiversity is enabled
     * many of the shallow trees are duplicates, so the distribution
     * is skewed toward deeper trees than one might expect.
     *
     * GPVARIABLE : A tree creation type that makes every node but those on the
     * bottom level a function and that builds every tree to
     * MaximumDepthForCreation.

     * If CreationType equals GPRAMPEDVARIABLE, all individuals are
     * created using the GPVARIABLE strategy and the depth of
     * successive individuals is ramped from minTreeDepth to 
     * MaximumDepthForCreation.
     *
     * If CreationType equals GPRAMPEDGROW, all individuals are 
     * created using the GPGROW strategy and the depth of successive
     * individuals is ramped.
     *
     * GRAMPEDHALF : A tree creation type that builds alternating trees using
     * the GPRAMPEDVARIABLE and GPRAMPEDGROW methods. The default
     * method.
     *
     * If CreationType equals GPGROW, all individuals are created
     * using the GPGROW strategy with depth MaximumDepthForCreation.
     *
     * If CreationType equals GPVARIABLE, all individuals are created
     * using the GPVARIABLE strategy with depth MaximumDepthForCreation.

CrossoverProbability
Percent probability that a crossover operation will occur while evolving a new generation.
MaximumComplexity
The largest complexity (number of nodes) allowed in any individual GP, including the main branch and all ADFs.
MaximumDepthForCrossover
The largest depth allowed for trees after crossover.
NumberOfGenerations
The maximum number of generations in a run.
PopulationSize
The number of individuals (GP instances) in a population.
SwapMutationProbability
Percent probability that swap mutation will occur for each individual added to a new generation.

Procedure

The configuration parameters are defined in a configuration file which was manually modified for each run of the GPOthello application. The results of each run were stored in flat text files which were manually examined for emerging patterns.

The gpjpp package provides measurements of the best, average and worst complexity, depth and variety of the population for each generation. Each of the three measures characterizing the population makeup was expected to affect the run time of the GP, which is also provided by the package. The goal was to investigate the effect of the configuration parameters on the GP's efficiency via the parameters' effect on the the 3 measures. The motivation for the exercise was the attempt to define an evolutionary process which produces the best Othello players (lowest fitness) in the least amount of time.

The most straightforward indication of a variable's effect on the fitness of the individuals generated by the GP was the number of runs (Runs) required to reach the target number of 'good runs' (GoodRuns). A good run is one where the GP generates an individual with fitness below the acceptance threshold (TerminationFitness) within a given number of generations (NumberOfGenerations). With the default configuration for this test, Runs was defined as the number of runs required for the GP to generate an individual with fitness < 50 within 30 generations, in 3 different runs. Runs was taken from a line in gpjpp's output after each run. Example : "Run number 12 (good runs 1)"

However, the most interesting measure of the GP's efficiency is the total run time. The effect of a variable on the runtime of the GP was examined based on the average run time for each generation. The total run time for each evolutionary process was calculated by summing the run times of all the runs and is the primary measure of a parameter value's desirability. A small tcl script was used to parse the GP's output and generate the efficiency indicators. The depth, variety and complexity of the generations produced during each GP test were selectively averaged and are mentioned in a separate column.

 
#!/usr/local/bin/tclsh
puts [format "%10s %23s %20s" Runs "Run time/generation" "Total run time"]
foreach fname $argv {
    set buf [split [exec grep Run [lindex $fname 0]] \n]
    set run_T 0
    set generation_T 0
    set Runs 0
    foreach line $buf {
        if {[string first "Run time" $line] == 0} {
            set run_t [lindex $line 2]
            set generation_t [lindex $line 4]
            set run_T [expr $run_T + $run_t]
            set generation_T [expr $generation_T + $generation_t]
        } else {
            incr Runs
        }
    }
    puts [format "%10s %23s %20s" $Runs [expr $generation_T / $Runs] $run_T]
}
Sample Run :
chi:OthelloProj/othello> ./process.tcl pop2.run pop20.run pop200.run 
      Runs     Run time/generation       Total run time
       161                  2.6459              12373.1
        15                 12.8513              5637.42
         5                 132.318              15457.8
All the run times are in seconds.

To save on CPU time, some tests were terminated before completing 3 good runs. Each test was allowed at least as much run time as the best test before it. Prematurely terminated tests are indicated as 'failed'. The number of good runs at the termination time is indicated in parentheses. For example, if a test was terminated when only one good run was completed, the word failed(1) will follow the test's statistics.

To study the effect of each variable separately, each parameter was given 3 different values while keeping the rest of the parameters unchanged. To minimize the run time, best value for the parameter was kept for subsequent runs. The initial values for all the parameters during the testing are listed below.

# primary GP settings
PopulationSize            = 200
NumberOfGenerations       = 30
CrossoverProbability      = 90
CreationProbability       = 0
CreationType              = 2
MaximumDepthForCreation   = 6
MaximumDepthForCrossover  = 17
MaximumComplexity         = 100
SelectionType             = 0

# secondary GP settings
TournamentSize            = 7
SwapMutationProbability   = 0
ShrinkMutationProbability = 0
AddBestToNewPopulation    = false
SteadyState               = false
DemeticGrouping           = false
DemeSize                  = 100
DemeticMigProbability     = 100

# reporting options
PrintDetails              = true
PrintPopulation           = true
PrintExpression           = true
PrintTree                 = false
TreeFontSize              = 12

# run testing settings
TerminationFitness        = 50
GoodRuns                  = 3
UseADFs                   = false
TestDiversity             = true
ComplexityAffectsFitness  = true
CheckpointGens            = 0

The tests were performed on a 10 CPU Sparc station.

Manufacturer        : Sun (Sun Microsystems)
System Model        : Ultra Enterprise
Number of CPUs      : 10
CPU Type            : sparc
App Architecture    : sparc
Kernel Architecture : sun4u
OS Name             : SunOS
OS Version          : 5.6
Kernel Version      : SunOS Release 5.6 Version Generic_105181-17 [UNIX(R) System V Release 4.0]

Code modifications

Both the gpjpp and GPOthello code were left intact except for the following bug fixes :

GPPopulation.java : calculateStatistics()
The average fitness calculation was incorrect. Initialized sumFitness to 0 (line 1357).
OthelloRandomPlayer.java : constructor
The random number generator was producing deterministic values. Replaced random.nextDouble() call with Math.random()
The output format was also slightly modified to facilitate the examination of the data.

Results

The unlabeled column before the results tables contains the values of the parameter appearing in bold.

PopulationSize
      Runs     Run time/generation       Total run time        Avg. Variety
  2    161                  2.6459              12373.1            0.710484
 20     15                 12.8513              5637.42		   0.769451
200      5                 132.318              15457.8		   0.851538
The population size was found to directly influence the GP's efficiency. As expected, a large population size provides a greater variety of individuals. making it more probable that the evolutionary process will produce a good Othello player in fewer runs. On the other hand, the run time also increases with population size and the most efficient GP was the one with only 20 individuals. The population size was kept at 20 for all subsequent tests.

CrossoverProbability
      Runs     Run time/generation       Total run time    Avg. Variety
 80          16                 12.3175              6122.09        0.801816 failed(0)
 90          15                 12.8513              5637.42	    0.808726
100          16                 12.3731              5844.61        0.770557 failed(1)
The effect of the crossover probability was difficult to predict. The variable was expected to affect the population variety, as more crossovers would produce a larger number of modified offspring. However, the results suggest that the effect of the parameter on the average variety of the population are minimal. The results also suggest that the variable affects the GP's efficiency, but the reason is unclear. Crossover probabilities of 80 and 100 failed to produce 3 good runs withing 16 total runs, so the crossover probability was kept at 90 for all subsequent tests.

CreationProbability
      Runs     Run time/generation       Total run time  Avg. Variety
 0      15                 12.8513              5637.42	     0.769451
10       9                 11.6889               3262.9       0.80597 failed(0)
20       4                   13.09              1021.88	     0.909615
30       7                 11.8829              2586.87      0.825521 failed(0)

A nonzero creation probability was expected to have a positive effect on the GP's efficiency, since a certain number of unique individuals would be added to the pool in each generation, thereby increasing the variety. An exceedingly large creation probability however was expected to pollute the gene pool and increase the fitness measure (decrease the average fitness). The results suggest that the effect is quite significant and verify the prediction. A creation probability of 20 produced 3 good runs in a total of 4, while probabilities of 10 and 30 did not produce a single good run in double the runs. The change in the run time per generation was found to be fairly insignificant. The creation probability was kept at 20 for all subsequent tests.

CreationType
      Runs     Run time/generation       Total run time    Avg. Depth
 0      18                 13.1933              6974.45	      3.08201
 1      34                 15.5653              16276.4	      5.37759
 2       4                   13.09              1021.88	       3.6141
 3       6                  11.004              2409.62	      2.89803 failed(0)
 4      27                 14.0474              11437.9	      3.43366

The different methods of generating generation 0 affect the GP's efficiency via their effect on the average tree depth. Very shallow trees were expected to be too simplistic to produce good results and deeper trees were expected to require a large number of generations to produce fit individuals. Indeed, the method for creating the first generation that produced the best results was the default method of GPRAMPEDHALF (2). This method results in trees of moderate depth which appears to positively affect the GP's efficiency. A surprising result was that the trees created using the GPVARIABLE(3) strategy were the ones with the lowest average depth. According to the documentation, all the trees were supposed to start with a depth equal to MaximumDepthForCreation, which had been kept to the default value of 6. An examination of the depth for generation 0 suggested that the GPVARIABLE strategy did not function as advertised, with tree depths varying from 2 to 6. The creation type was kept at 2 for all subsequent tests.

MaximumDepthForCreation
      Runs     Run time/generation       Total run time    Avg. Depth
 4      25                  13.472              9896.57	      2.58933
 6      11                 13.9182              4280.09	      3.16306
 8      26                 13.8404              10722.5	      3.34219

The effect of the maximum depth for creation on the GP's effectiveness verified the value of moderately deep function trees. The test with a default value of 6 was repeated, as the run time of 1021 sec previously produced appeared to be suprizingly short in comparison the the run time of the other tests. The GPRAMPEDHALF strategy with MaximumDepthForCreation of 6 once more again produced the best results, but in 11 Runs, as compared to the 4 previously required. The maximum depth for creation was kept at 6 for all subsequent tests.

SwapMutationProbability
      Runs     Run time/generation       Total run time     Avg. Variety
 0      11                 13.9182              4280.09	        0.808726
10      16                 13.4156              6147.35         0.877669
20      16                 13.4844              5887.06		0.912329
30      20                 12.7405              7768.52		0.907355

A nonzero swap mutation probability was expected to improve the GP's efficiency by increasing the population variety. The results suggest that the variety was indeed increased, but with negative effects. The most likely explanation for the adverse effect of the swap mutation is the probable pollution of useful subtrees. As a result, the best individuals of a generation would be less likely to pass the attributes that enhanced their fitness to their offspring. The difference in the run times was relatively small, suggesting that the positive effects of a large variety partly offset the negative effects of the gene pool pollution. The swap mutation probability was kept at 0 for all subsequent tests.

MaximumDepthForCrossover
      Runs     Run time/generation       Total run time     Avg. Depth
 7      14                  13.065              5121.71        3.16087
10      13                 13.7154              5101.14        3.42466
13      24                 13.3417              9450.86	       3.18463
17      13                 13.5862              4877.93	       3.16306
20      11                 13.9182              4280.09	       3.19279

The maximum depth of individuals after crossover was expected to affect the GP's efficiency similarly to the way the creation strategy did, via its effect on the average depth of the function trees. All the tested values resulted in populations with average depths very close to the average depths of the trees produced with the default value of 17. The large increase in the run time with the value of 13 is puzzling and most likely insignificant. Since the effect of the maximum depth for crossover was relatively small, the default value of 17 was kept for the remaining test.

MaximumComplexity Runs Run time/generation Total run time Avg. Complexity 20 12 13.9325 4314.64 0.788871 40 13 13.7431 4974.15 0.792818 60 13 13.4354 4780.75 0.779014 100 13 13.5862 4877.93 0.808726

The elimination of individuals exceeding an upper complexity limit was expected to have a positive effect, as long as the complexity of the remaining individuals was not exceedingly low. The average complexity of the individuals was already relatively low, so that the algorithm started excluding individuals only after the limit was set below 50. The results suggest that the effect was relatively insignificant, with a minor improvement when the limit was set to 20.

Conclusions

The tested parameters affecting the GP's efficiency can be broken in three categories, based on their effect on the average depth, complexity and variety of the resulting function trees. The parameters affecting the depth of the trees were the creation method for the first generation, the maximum depth for creation and the maximum depth of individuals for crossover. The parameter affecting the complexity of the trees was the limit on the number of nodes. The parameters affecting the variety of the trees were the population size, the creation probability and the swap mutation probability. The only parameter that did not seem to fit in any of the categories was the crossover probability. The suggested value for each parameter and the effect of the parameters on the relevant population attributes and on the GP's efficiency are tabulated below.

Parameter Affected attribute Effect on attribute Effect on efficiency Suggested Value
PopulationSize Variety Strong Strong 20
CrossoverProbability ? ? Strong 90
CreationProbability Variety Strong Strong 20
CreationType Depth Strong Strong 2
MaximumDepthForCreation Depth Moderate Strong 6
SwapMutationProbability Variety Strong Moderate 0
MaximumDepthForCrossover Depth Moderate Minimal 20
MaximumComplexity Complexity Moderate Minimal 20

Christopher Akritides
Last modified: Thu Apr 6 14:26:57 EDT 2000