Sunday, February 13, 2011

StringEvolver: Evolving strings with a genetic algorithm

This is my first attempt at experimenting with a genetic algorithm. The following is an application that evolves a string given a destination to converge to.

Here is an example run:
> StringEvolver -m 0.25 -s 0.6 -c 2000 -e 0.1 --fitness=hamming --ctype=one -t 0.3 "Charles Darwin was right!"

Evolution Destination: Charles Darwin was right!
Mutation Rate: 25%
Crossover Rate: 60%
Truncation Rate: 30%
Chromosomes / population: 2000
Elitism / population: 200 (0.1%)
Fitness Calculator: Hamming Distance
Crossover Type: One Point

    1: L<eI`Y@ D#raF'JKJ7DUAg"GR
    2: L<eI`Y@ D#raF'JKJ7DUAg"GR
    3: chNOLe"5]={Iiyrgk*,rEpE/!
    4: L<eI`Y@ D#ra nR$C1SrYqha+
    5: L<eI`Y@ D#ra nR$C1SrYqha+
    6: lpa5`Y@ D#ra nR$C8y.iqrt5
    7: C]Y&hJ1 %aRwi< aaB,r_Zh68
    8: 1s[r.ts %aGwi< _a"=rjXyN!
    9: chNOlFsyD#rwi< }aB,r_Zh68
   10: chNO9e" D#rwi< }aB=rPgh6!
   11: chNO9e" D#rwi< }aB=rPgh6!
   12: 1s[r.Ks D#ra n}way rAght!
   13: C}hplFs D#rwi.}way rPgh6!
   14: C'aclos Da{]inrwas r.ghx!
   15: C'aclos Da{]inrwas r.ghx!
   16: ChNrlFs Da>win}was  `gha!
   17: ChNrlFs Da>winlwas r;gh6!
   18: Ch[rlFs Da>wi< was rAght!
   19: Chacle@ D#rwin was rjght!
   20: Chacle@ D#rwin was rjght!
   21: ChNrles Darwin}was rAght!
   22: ChNrles Darwin}was rAght!
   23: Charles Darwin}was r`ght!
   24: Charles Darwin}was r`ght!
   25: Charles Darwin was rAght!
   26: Charles Darwin was rAght!
   27: Charles Darwin was rAght!
   28: Charles Darwin was rAght!
   29: Charles Darwin was rAght!
   30: Charles Darwin was right!

Found in 30 generations
Time Taken: 00:00:00.4730271


You can download the source from github:

Command Line arguments

-m, --mutation=VALUE
A value between 0-1

The mutation rate determines the probability of how often a selected chromosome mutates. Mutation is very important in a genetic algorithm because it helps in keeping diversity in the population, thus avoid local minima which can slow or even halt further evolution.

The application currently uses a mutation operator called Single Point Mutation. For our current application, it works by randomly selecting a point in the string and switching the character at that point to another random character.

As an example, say the chromosome to be mutated currently contains the string "aBcdE". The single point mutation operator then randomly chooses position 2 (0-based) and replaces the character 'c' with the randomly chosen character 'W'.

This way, the chromosome "aBcdE" has now been mutated to "aBWdE".

-s, --crossover=VALUE
A value between 0-1

The crossover rate determines how often two selected chromosomes are sexually combined to produce two separate offspring.

For this application, there are two available crossover operations: One Point Crossover and Two Point Crossover. These operators are discussed further in the ctype argument.

-e, --elitism=VALUE
A value between 0-1

The elitism rate determines the number of the fittest solutions that are directly (untouched) transferred to the advancing population.

Elitism can help in preventing the potential loss of possibly good solutions and can lead to quicker convergence.

-c, --crcount=VALUE
A value greater than 1

The chromosome count determines the number of chromosomes that each population will have. The greater the number of chromosomes, the more time a population takes to advance to the next.

VALUE = sum, levenshtein or hamming

The fitness calculator determines which algorithm is used to calculate how fit a given chromosome is.

There are currently three fitness calculators to choose from:

The sum calculator calculates the ASCII difference between each character of the target strings.
public override double CalculateFitness(Chromosome ch)
    var distanceToTarget = Target.Select((t, i) => Math.Abs(t - ch.Value[i])).Sum();
    return 1.0 / distanceToTarget;
Levenshtein Distance
The Levenshtein distance between two strings is the minimum number of edits needed to transform one string into the other. The transformation can include insertion, deletion and substitution. This distance is also referred to as the edit distance.
public override double CalculateFitness(Chromosome ch)
    return 1.0 / LevenshteinDistance(ch.Value, Target);
Hamming Distance
The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. The reason Hamming distance requires the strings to be of equal length is because unlike the Levenshtein distance, the Hamming distance only allows substitutions.

As an example, the Hamming distance between "janica" and "jenixo" is 3.
public override double CalculateFitness(Chromosome ch)
    if (ch.Value.Length != Target.Length)
        return 1.0 / double.PositiveInfinity;

    var difference = 0;
    for (int i = 0; i < Target.Length; i++)
        if (ch.Value[i] != Target[i])
    return 1.0 / difference;

VALUE = one or two

The crossover type determines which crossover operator is to be used when sexually combining two chromosomes.

There are currently two available methods to choose from:

The One-Point Crossover method chooses a random point (called a locus) in the string and all the data beyond that point in either chromosome is swapped between the parent chromosomes. The resulting two chromosomes are the children.
public Tuple<Chromosome, Chromosome> Crossover(Chromosome c1, Chromosome c2)
    var locus = random.Next(0, c1.Value.Length + 1);
    string ch1 = c1.Value.Substring(0, locus) + c2.Value.Substring(locus),
           ch2 = c2.Value.Substring(0, locus) + c1.Value.Substring(locus);

    return new Tuple<Chromosome, Chromosome>(new Chromosome(ch1, fitnessCalculator), new Chromosome(ch2, fitnessCalculator));

The Two-Point Crossover method is very similar to the One-Point Crossover method but this one chooses two random points along the string rather than one. The data between the two points is swapped and the resulting chromosomes are the children.
public Tuple<Chromosome, Chromosome> Crossover(Chromosome c1, Chromosome c2)
    int locus1 = random.Next(0, c1.Value.Length),
        locus2 = random.Next(locus1, c1.Value.Length),
        distance = locus2 - locus1;

    string ch1 = c1.Value.Substring(0, locus1) + c2.Value.Substring(locus1, distance) + c1.Value.Substring(locus2),
           ch2 = c2.Value.Substring(0, locus1) + c1.Value.Substring(locus1, distance) + c2.Value.Substring(locus2);

    return new Tuple<Chromosome, Chromosome>(new Chromosome(ch1, fitnessCalculator), new Chromosome(ch2, fitnessCalculator));

-t, --truncate=VALUE
0 < VALUE <= 1

The Truncation value determines how many chromosomes of the current population should be kept and used for the selection, crossover and mutation operations. Note that before the truncation is performed, the chromosomes in the population are sorted in descending order by their fitness, so the fittest chromosomes are kept after the truncation.

A value of 1 uses all of the chromosomes and a value of 0.5 uses half of the population (the better half, of course)