PGAPy: Python Wrapper for PGAPack Parallel Genetic Algorithm Library

Author: Ralf Schlatterbeck <rsc@runtux.com>

News: This version wraps the Differential Evolution method (that's quite an old method but is newly implemented in pgapack).

PGAPy is a wrapper for PGAPack, the parallel genetic algorithm library (see PGAPack Readme), a powerfull genetic algorithm library by D. Levine, Mathematics and Computer Science Division Argonne National Laboratory. The library is written in C. PGAPy wraps this library for use with Python. The original PGAPack library is already quite old but is one of the most complete and accurate (and fast, although this is not my major concern when wrapping it to python) genetic algorithm implementations out there with a lot of bells and whistles for experimentation. It also has shown a remarkably small number of bugs over the years. It supports parallel execution via the message passing interface MPI in addition to a normal "serial" version. That's why I wanted to use it in Python, too.

There currently is not much documentation for PGAPy. You really, absolutely need to read the documentation that comes with PGAPack – and of course you need the PGAPack library. The PGAPack user guide is now shipped together with PGAPy. It is installed together with some examples in share/pgapy, wherever the Python installer decides to place this in the final installation (usually /usr/local/share on Linux).

The original PGAPack library can still be downloaded from the PGAPack ftp site, it is written in ANSI C and therefore should run on most platforms. Note that this version is not very actively maintained. I've started a PGAPack fork on github where I've ported the library to the latest version of the MPI standard and have fixed some minor inconsistencies in the documentation. I've also implemented some new features (notably enhancements in selection schemes and a new replacement strategy called restricted tournament replacement)

I have tested pgapy on Linux only and I'll currently not provide Windows versions. You also can find my PGAPack fork on github this repository has the three upstream releases as versions in git and contains some updates concerning support of newer MPI versions and documentation updates. I've also included patches in the git repository of the Debian maintainer of the package, Dirk Eddelbuettel.

To get you started, I've included some very simple examples in examples, e.g., one-max.py implements the "Maxbit" example similar to one in the PGAPack documentation. The examples were inspired by the book "Genetic Algorithms in Python" but are written from scratch and don't include any code from the book. The examples illustrates several points:

Naming conventions in PGAPy

When you extend PGAPy – remember not all functions of PGAPack are wrapped yet and you may need additional functions – you should stick to my naming conventions when making changes. The following naming conventions were used for the wrapper:

Constructor Parameters

PGApack has a lot of PGASet and PGAGet functions for setting parameters. These are reflected in constructor parameters on the one hand and in read-only properties of a PGA object on the other hand. The following table gives an overview of all the original PGApack names and the names of the python wrapper. For the PGApack name I've only listed the PGASet function, in many cases there is a corresponding PGAGet function. If a corresponding read-only property exists for a constructor parameter this is indicated in the "Prop" column. In some cases properties are missing because no corresponding PGAGet function is implemented in PGApack, in other cases returning a numeric value that has a symbolic constant in PGApy doesn't make much sense. The properties have the same name as the constructor parameter. There is one Property that is no constructor parameter, namely the GA_iter property that returns the current GA generation. In the type column I'm listing the Python type. If the type is followed by a number, more than one item of that type is specified (a sequence in Python). Some entries contain "sym", these are integer values with a symbolic constant, the value "msym" indicates that several values denoted by a list of symbolic constants can be given. A special case are the PGASetRealInitRange, PGASetRealInitPercent, PGASetIntegerInitRange functions. These take two values for each allele of the gene. In python this is a sequence of 2-tuples. Note that this means that you can have different ranges of allowed values for each allele.

The first two (mandatory) constructor parameters are the type of the gene (this takes a Python type, e.g., bool for a binary genome or int for an integer genome) and the length. Note that the string_length is implicitly set with the length parameter. The string_length is also available as the length of the PGA object using the Python built-in len function.

PGApack name Constructor parameter Type Prop
PGASetCrossoverProb crossover_prob float yes
PGASetCrossoverType crossover_type sym no
PGASetFitnessCmaxValue fitness_cmax float yes
PGASetFitnessType fitness_type sym no
PGAGetGAIterValue GA_iter int yes
PGASetIntegerInitPermute integer_init_permute int2 no
PGASetIntegerInitRange init   no
PGASetMaxFitnessRank max_fitness_rank float yes
PGASetMaxGAIterValue max_GA_iter int yes
PGASetMaxNoChangeValue max_no_change int no
PGASetMaxSimilarityValue max_similarity int no
PGASetMutationAndCrossoverFlag mutation_and_crossover int yes
PGASetMutationBoundedFlag mutation_bounded int yes
PGASetMutationIntegerValue mutation_value int yes
PGASetMutationOrCrossoverFlag mutation_or_crossover int yes
PGASetMutationProb mutation_prob float yes
PGASetMutationRealValue mutation_value float yes
PGASetMutationType mutation_type sym no
PGASetNoDuplicatesFlag no_duplicates int no
PGASetNumReplaceValue num_replace int yes
PGASetPopSize pop_size int yes
PGASetPopReplaceType pop_replace_type sym no
PGASetPrintFrequencyValue print_frequency int yes
PGASetPrintOptions print_options msym no
PGASetPTournamentProb p_tournament_prob float yes
PGASetRandomizeSelect randomize_select int yes
PGASetRandomSeed random_seed int yes
PGASetRealInitRange init   no
PGASetRealInitPercent init_percent   no
PGASetRestartFlag restart int yes
PGASetRestartFrequencyValue restart_frequency int yes
PGASetRTRWindowSize rtr_window_size int yes
PGASetSelectType select_type sym no
PGASetStoppingRuleType stopping_rule_types msym no
PGASetStringLength string_length int yes
PGASetTournamentSize tournament_size int yes
PGASetTournamentWithReplacement tournament_with_replacement int yes
PGASetTruncationProportion truncation_proportion float yes
PGASetUniformCrossoverProb uniform_crossover_prob float yes

PGA Object Methods

The following are the methods that can be used during the run of the genetic search. The run method is used to start the search. This can be used, to, e.g., set an allele during hill-climbing in a custom endofgen method. Note that some methods only apply to certain gene types, e.g. the encode_int_ methods can only be used on binary alleles (they encode an integer value as a binary or gray code representation into the gene). Other methods take or return different types depending on the type of gene, e.g. get_allele or set_allele, they call different backend functions depending on the gene type. With the set_random_seed method, the random number generator can be re-seeded. It is usually best to seed the generator once at (before) the beginning by specifying random_seed in the constructor. For further details consult the user guide.

Method Parameters Return
check_stopping_conditions   True if stop should occur
encode_int_as_binary p, pop, frm, to, val None
encode_int_as_gray_code p, pop, frm, to, val None
encode_real_as_binary p, pop, frm, to l, u, val None
encode_real_as_gray_code p, pop, frm, to l, u, val None
euclidian_distance p1, pop1 p2, pop2 float
fitness pop None
get_allele p, pop, index allele value
get_best_index pop index of best string
get_evaluation p, pop evaluation of p (float)
get_evaluation_up_to_date p, pop True if up-to-date
get_fitness p, pop fitness of p (float)
get_int_from_binary p, pop, frm, to int
get_int_from_gray_code p, pop, frm, to int
get_iteration   deprecated, use GA_iter
get_real_from_binary p, pop, frm, to, l, u float
get_real_from_gray_code p, pop, frm, to, l, u float
random01   float between 0 and 1
random_flip probability 0 or 1
random_gaussian mean, stddev float
random_interval l, r int between l, r
random_uniform l, r float between l, r
run   None
select_next_index pop index selected individual
set_allele p, pop, i, value None
set_evaluation p, pop, value None
set_evaluation_up_to_date p, pop, status None
set_random_seed seed None (use constructor!)

User-Methods

PGApack has the concept of user functions. These allow customization of different areas of a genetic algorihm. In Python they are implemented as methods that can be changed in a derived class. One of the methods that must be implemented in a derived class is the evaluate function (although technically it is not a user function in PGApack). It interprets the gene and returns an evaluation value. PGApack computes a fitness from the raw evaluation value. For some methods an up-call into the PGA class is possible, for some methods this is not possible (and in most cases not reasonable). Note that for the stop_cond method, the standard check for stopping conditions can be called with:

self.check_stopping_conditions()

The following table lists the overridable methods with their parameters (for the function signature the first parameter self is omitted). Note that in PGApack there are additional user functions that are needed for user-defined data types which are currently not exposed in Python. In the function signatures p denotes the index of the individual and pop denotes the population. If more than one individual is specified (e.g., for crossover) these can be followed by a number. For crossover c1 and c2 denote the destination individuals (children). The propability for the mutation method is a floating-point value between 0 and 1. Remember to count the number of mutations that happen, and return that value for the mutation method!

Method Call Signature Return Value Up-Call
check_duplicate p1, pop1, p2, pop2 True if dupe no
stop_cond   True to stop no
crossover p1, p2, p_pop, c1, c2, c_pop None no
endofgen   None no
evaluate p, pop float no
gene_difference p1, pop1, p2, pop2 float no
initstring p, pop None no
mutation p, pop, propability #mutations no
pre_eval pop None no
print_string file, p, pop None yes

Constants

The following PGApack constants are available:

Constant Description
PGA_CROSSOVER_ONEPT One-point Crossover
PGA_CROSSOVER_TWOPT Two-point Crossover
PGA_CROSSOVER_UNIFORM Uniform Crossover
PGA_FITNESSMIN_CMAX Map fitness by subtracting worst
PGA_FITNESSMIN_RECIPROCAL Map fitness via reciprocal
PGA_FITNESS_NORMAL Linear normalization of fitness
PGA_FITNESS_RANKING Linear fitness ranking
PGA_FITNESS_RAW Identity fitness function
PGA_MUTATION_CONSTANT Mutation by adding/subtracting constant
PGA_MUTATION_GAUSSIAN Mutation by selecting from Gaussian distribution
PGA_MUTATION_PERMUTE Mutation swaps two random genes
PGA_MUTATION_RANGE Replace gene with uniform selection from init range
PGA_MUTATION_UNIFORM Mutation uniform from interval
PGA_NEWPOP Symbolic constant for new population
PGA_OLDPOP Symbolic constant for old population
PGA_POPREPL_BEST Population replacement best strings
PGA_POPREPL_PAIRWISE_BEST Compare same index in old and new population
PGA_POPREPL_RANDOM_NOREP Population replacement random no replacement
PGA_POPREPL_RANDOM_REP Population replacement random with replacement
PGA_POPREPL_RTR Restricted Tournament Replacement
PGA_REPORT_AVERAGE Report average evaluation
PGA_REPORT_HAMMING Report hamming distance
PGA_REPORT_OFFLINE Report offline
PGA_REPORT_ONLINE Report online
PGA_REPORT_STRING Report the string
PGA_REPORT_WORST Report the worst evaluation
PGA_SELECT_LINEAR Return individuals in population order
PGA_SELECT_PROPORTIONAL Fitness-proportional selection
PGA_SELECT_PTOURNAMENT Binary probabilistic tournament selection
PGA_SELECT_SUS Stochastic universal selection
PGA_SELECT_TOURNAMENT Tournament selection
PGA_SELECT_TRUNCATION Truncation selection
PGA_STOP_MAXITER Stop on max iterations
PGA_STOP_NOCHANGE Stop on max number of generations no change
PGA_STOP_TOOSIMILAR Stop when individuals too similar

Missing Features

As already mentioned, not all functions and constants of PGAPack are wrapped yet – still for many applications the given set should be enough. If you need additional functions, you may want to wrap these and send me a patch.

Another feature of PGAPack is currently not implemented in the wrapper, the usage of custom datatypes. With PGAPack you can define your own datatypes complete with their custom implementations of the genetic algorithm functionality like crossover, mutation, etc. I don't expect problems implementing these, though.

Reporting Bugs

Please use the Sourceforge Bug Tracker or the Github Bug Tracker and

Resources

Project information and download from Sourceforge main page

or checkout from Github

or directly install via pypi.

Changes

Version 0.8: Bugfix in real mutation

Version 0.7: Major changes in wrapping

Version 0.6: Major changes in wrapping

Version 0.5: Bug-fix release

Version 0.4: Bundle PGAPack

Version 0.3: Feature enhancements, Bug fixes

Port to Python3, Python2 is still supported, license change.

Version 0.2: Feature enhancements, Bug fixes

64 bit support, more PGAPack functions and attributes wrapped, Readme-update: Sourceforge logo, Changes chapter.

Version 0.1: Initial freshmeat announcement

PGAPy is a wrapper for PGAPack, the parallel genetic algorithm library, a powerful genetic algorithm library. PGAPy wraps this library for use with Python. Pgapack is one of the most complete and accurate genetic algorithm implementations out there with a lot of features for experimentation.