
PGA, the Parallel Genetic Algorithms testbed -- version 2.7
-----------------------------------------------------------

Posted by Peter Ross, peter@aisb.ed.ac.uk.
Current version developed/maintained by Peter Ross, peter@aisb.ed.ac.uk 
Original version by Geoffrey H. Ballinger, geoff@ed.ac.uk


NEW SINCE 2.6

Briefly:
         - better timetabling; you can choose various methods of 
           mutating a timetable, by choosing the event to move
           and the slot to move it to according to various algorithms.
           In particular, -ett:r+t5 is good - random choice of event
           and use tournment selection size 5 to find good slot to move
           it to. Use this with -m1.0 -a for decent results.
         - -No<n> and -Na<n> are like -NO<n> and -NA<n> but also save
           the chromosomes at the end.
         - when non-interactive, convergence also counts when deciding
           whether to stop.
         - can specify data file other than weights/rrdata/ttdata by
           using -Fdatafile flag
         - fixed a silly bug in timetabling which meant that the `days'
           was really `slots per day'
         - added pga-batch, a simple shell script to let you do lots of
           runs non-interactively and summarise the results
         - can set crossover type to "none", so no crossover happens.
           This lets you investigate the benefit of crossover a bit
           more.
         - even more code tidying.

NEW SINCE 2.5

Briefly: 
         - simple timetabling, using a penalty function: problem is
           specified in a separate file using a reasonably flexible
           language to specify constraints etc. This can solve real
           non-trivial problems.
         - can be run noninteractively; stop when limit (set by -l)
           reached or when the fittest in one or all populations passes
           a given threshold
         - pauses when all populations appear to have converged.
         - another new problem, mcbK: alleles lie in range 0..K, fitness is
           length of maximum contiguous block of identical alleles. This
           means there are K+1 distinct maxima to be found. Good for
           basic explorations with spatial selection.
         - in the spatial selection options, you can dump a rectangular
           map showing where the fittest chromosomes are, to *.map. Up to
           26 distinct highly-fit chromosomes are mapped explicitly. Thus
           you can explore how fit chromosomes take over territory and
           shape each others' territory.
         - Gray code option for the function optimisation tasks
         - stand-alone decode program. In the function optimisation
           tasks a chromosome is a (0/1-encoded) sequence of real
           numbers. decode lets you reover those real numbers, for use
           with external plotting programs so you can see how the
           solutions move in Cartesian space.
         - interactive control of number of generations between prompts
           (should have had this years ago, blush) .
         - can always save the chromosomes, even if you didn't mention
           a file.
         - report file also gives the random number seed, whether from
           user or clock, so runs can be replicated precisely.
         - more informative command-line prompts.
         - some minor bugs fixed.
         - minor code tidying and shuffling between files.

NEW SINCE 2.4

Briefly: - proper Royal Road (as in Holland's challenge); 
         - two kinds of tournament selection;
         - spatially-structured reproduction, on a 2-D toroidal grid,
           either generational or one at a time;
         - mutation is now per-bit, meaning of adaptive has changed;
         - can seed the random number generator;
         - minor mod to format of *.chr file;
         - crossover can produce two children or just one (default is one);
         - source split into several files.

WHAT IT IS

PGA is a simple testbed for basic explorations in genetic algorithms.
Command line arguments control a range of parameters, there are a
number of built-in problems for the GA to solve. The current set
consists of:
  - maximise the number of bits set in a chromosome
  - De Jong's functions DJ1, DJ2, DJ3, DJ5
  - binary F6, used by Schaffer et al
  - maximise the length of a sequence of equal alleles; alleles
    drawn from a set of user-specified size (set sizes 2..10)
  - a crude 1-d knapsack problem; you specify a target and a set of
    numbers in an external file, GA tries to find a subset that sums
    as closely as possible to the target
  - the `royal road' function(s), as defined in Holland's challenge.
    The various parameters can be specified in an external file.
  - that max-contiguous-block problem: the user specifies how many
    distinct values an allele can have, and fitness is the size
    of the longest sequence of equal-valued alleles. So the maxima
    are non-overlapping.
  - simple timetabling, in terms of (fixed-size) slots and events.
    Constraints are specified in a separate file, and describe
    which events must not clash, event orderings, which events are
    to be fixed in place, which slots an event must not occupy,
    which events should be kept apart, and the penalties for violations.
and it's easy to add your own problems (see below). Chromosomes are
represented as character arrays, so you are not (quite) stuck with
bit-string problem encodings.

PGA allows, among other things:
  - multiple populations, with/without periodic migration (the `island' model)
  - various reproduction strategies
     - generational
     - GENITOR-like
     - spatially-structured generational (the `cellular' model)
     - spatially-structured GENITOR-like ( " )
  - various selection strategies:
     - rank-based as in GENITOR
     - roulette-wheel (fitprop)
     - tournament, with control of tournament size
     - `marriage tournament', with control of tournament size
  - one-point, two-point and uniform crossover
  - choice of whether crossover produces one or two children
  - control of population size, chromosome length (independently
    of the problem!), the usual rate parameters.

The command-line options are summarised by the `-h' flag:

PGA: parallel genetic algorithm testbed, version 2.7
   -P<n>    Set number of populations. (5)
   -p<n>    Set number of chromosomes per population. (50)
   -n<n>    Set chromosome length. (32)
   -l<n>    Set # of generations per stage. (100)
   -i<n>    Set reporting interval in generations. (10)
   -M<n>    Interval between migrations. (10)
   -m<n>    Set bit mutation rate. (0.02)
   -c<n>    Set crossover rate (only for `gen'). (0.6)
   -b<n>    Set selection bias. (1.5)
   -a       Adaptive mutation flag (FALSE)
   -t       Twins: crossover produces pairs (FALSE)
   -g       In function optimisation, when decoding treat
              bit pattern as Gray code (FALSE)
   -C<op>   Set crossover operator. (two)
   -s<op>   Set selection operator. (rank)
   -r<op>   Set reproduction operator. (one)
   -e<fn>   Set evaluation function. (max)
   -S<n>    Seed the random number generator. (from clock)
   -NO<n>   Non-interactive, stop when One reaches <n>.
   -No<n>   Like -NO<n> but also save final chromosomes.
   -NA<n>   Non-interactive, stop when All reach <n>.
   -Na<n>   Like -NA<n> but also save final chromosomes.
   -F<file> Use problem datafile <file> instead of default.
   -h       Display this information.
   <file>   Also log output in <file>. (none)

   Crossover operators ... one, two, uniform, none.
   Selection operators ... rank, fitprop,
                           tnK (K=integer > 1),
                           tmK (K=integer > 1).
   Reproduction operators ... one, gen,
                              ssoneN (N=integer > 0),
                              ssgenN (N=integer > 0).
   Evaluation functions ... max, dj1, dj2, dj3,
                            dj5, bf6, knap, tt,
                            tt:E+S (E=r/w/tM; S=r/f/tN),
                            mcbK (K in 1..9), rr.

The output is curses-based, with optional output to file for later
plotting or analysis using a tool such as (g)awk. The screen layout
looks like this:

.................................................................
 (A)gain, (Q)uit, (C)ontinue:  
           Populations: 5            Chromosomes per pop: 50
                                       Chromosome length: 32
 Generations per stage: 100                 Reproduction: one
    Reporting interval: 10                Crossover type: uniform, twins
    Migration interval: 10                Crossover rate: n/a
         Eval function: knap               Mutation rate: 0.0200
             Selection: rank
        Selection bias: 1.50                  Generation: 1000
                                      Evaluations so far: 5000
              Pop.......Average..........Best.(max = 1.0)
              0   =       0.3333333        0.3333333
              1           0.3366667        0.5000000
              2   =       0.3333333        0.3333333
              3           0.4876190        1.0000000
              4           0.2609524        0.3333333
.................................................................

The `A' option restarts with new randomly-chosen chromosomes. The `C'
option continues for a further number of generations, as determined by
the `-l' flag (`generations per stage' in the above display). The `='
opposite populations 0 and 2 show that they appear to have converged,
because the average fitness and best fitness are equal. If you have
specified output to file too, then you also get the option of saving the
chromosomes to a file called filename.chr. If you've chosen a spatial
selection option, you can save a map of the grid of chromosomes to a
file called filename.map.

INSTALLING IT

There is a very simple Makefile, which doesn't even install it for you!
Source consists of one C file and one header file, using K+R C. The code
was developed on a Sun-4. The curses usage is pretty simple, so it
should be easy to adapt it to your own system. I don't have access to a
range of machines, so I haven't provided lots of system-dependent
switches, although it works on various SunOS, SOLARIS and HP/UX. 
This distribution contains the following files:
    COPYING      ... GNU public license
    QUESTIONS    ... some practical questions for you to investigate
    README       ... this file, you're reading it
    analyse.tex  ... LaTeX: hints on analyisng a set of chromosomes
    graph1.awk   ... example of how to use (g)awk to plot output
    graph2.awk   ... fancier example, plot each population on same graph
    graph3.awk   ... plot average and best of one population on same graph
    src/         ... the source, with Makefile:
                       Makefile
                       README
                       cross.c   ... crossover operators
                       eval.c    ... the problems
                       help.c    ... help text for -h
		       init.c    ... initialising populations
                       main.c    ... the main bit
                       mutate.c  ... mutation operators
                       pga.h     ... the structures etc
                       reprod.c  ... reproduction operators
                       screen.h  ... rows and columns for screen layout
                       select.c  ... selection operators
                       version.h ... version number and history
                       decode.c  ... stand-alone chromosome decoder
                       clique.pl ... separate simple Prolog program which
                                     finds maximal cliques in graphs.
				     Useful for those interested in
				     timetabling applications.
    pga.tex      ... LaTeX document describing it
    pga.man      ... man page for pga
    pga-batch    ... simple shell script for repeated runs
    decode.man   ... man page for the stand-alone chromosome decoder
    rrdata       ... sample parameter file for Royal Road
    weights      ... example weights file for crude knapsack problem
    ttdata       ... example timetabling data file (copy of one below)
    ttdata.ael   ... same as previous file
    ttdata.edai93 .. larger timetabling problem
    ttdata.edai94 .. even tougher one

ADDING NEW PROBLEMS

To add a new problem to pga:
  - create a new eval_whatever function, alongside the others in eval.c;
    put any problem-specific auxiliary functions there too (eg
    read_weights_file is used in knpasack problem, is beside eval_knap).
    Remember that eval_whatever has to increment evals each time it is
    called.
  - declare eval_whatever extern at top of file main.c
  - declare any problem-specific globals and other data-reading
    functions at top of file main.c
  - go to procedure handle(), case 'e', add the branch which sets
    eval to be eval_whatever, and sets maxfitness (a string) and
    eval_name (a string) appropriately
  - in main.c main(), just after handle() is called, you may need to add any
    problem-specific setup stuff that cannot be done until all arguments
    are processed (eg see the bit which says if(eval == eval_knap).. )


PROBLEMS, QUESTIONS

PGA has been used for teaching for several years now, and has been
used as a starting point by a fair number of people for their own
projects. So it's reasonably reliable. However, if you find bugs, or have
useful contributions to make, Tell Me!

Peter Ross
Dept of AI
University of Edinburgh
80 South Bridge
Edinburgh EH1 1HN

peter@aisb.ed.ac.uk





