Introduction to Genetic Algorithms: Intuition & Python Implementation
Inspired by Charles Darwin's theory of natural selection, genetic algorithms are a search heuristic that belong within the larger class of artificial intelligence called evolutionary algorithms.
Genetic algorithms essentially try and replicate the process of selecting the fittest solutions for reproduction in order to generate even higher quality solutions to solve the problem at hand.
In order to understand genetic algorithms we'll first discuss their intuition and then we'll look at an implementation in Python.
This article is based on notes on this course on Artificial Intelligence for Simple Games and is organized as follows:
- Intuition of Genetic Algorithms
- Python Implementation of Genetic Algorithms
1. Intuition of Genetic Algorithms
To cover the intuition of genetic algorithms we first need to review the following concepts:
- DNA
- Fitness function
- Population
- Selection
- Crossover
- Mutation
DNA
The first part of genetic algorithms is the DNA, which are a sequence of steps connected together.
DNA for genetic algorithms could be a sequence of moves or behaviors that define how the solution candidate behaves. For example, if we are trying to solve a maze we can use a genetic algorithm to figure out the sequence of steps we need to take to navigate from start to finish.
In this example, the sequence of moves that the algorithm will take to complete the maze is the DNA of the solution.
The length of the sequence is referred to as the DNA length.
Fitness Function
Simply put, the Fitness function defines how well each candidate performs in the environment.
If we go back to the grid world example and implement the DNA sequence mentioned above.
If the solution lands in the middle of the grid and not at the finish line we can measure the distance $S$, which is the distance from the finish line to the grid we landed $(x, y)$.
In order to get the distance $S$ we simply need the distance of the $S_x$and $S_y$ value.
As you can guess, the Fitness function $S$ is then calculated with the Pythagorean theorem as follows:
$$S = \sqrt {S{_x^2} + S{_y^2}}$$
Population
Up until this point we've only discussed calculating the fitness of single solution with certain DNA. Genetic algorithms, however, use an entire population of solutions. The population is the collection of all the agents that we're evaluating with the fitness function.
It's important to note that every soluiton has its own unique DNA and fitness result.
We can represent our population in a simple table that keeps track of the candidate number, the DNA, and the fitness.
At this point the population is referred to as the Initial Population and the DNA of each solution has been chosen randomly.
Selection
Selection is the process by which we select the best solution from the current population in order to create a new generation of candidates.
Selection sorts the population based on fitness, in this example we're choosing the solution with the lowest score.
The number of solutions that we want to include in this selection process is up to us - we can either select a certain number of agents, or a certain % of top agents.
Crossover
The Crossover is a step in which we take two of the solutions taken from the previous selection process and we combine their DNA.
Their combined DNA will then create a new solution which will be a part of a new population.
The reasoning for mixing the DNA of each solution candidate from the previous population is that by combining the best candidates we should theoretically get an even better population.
In order to complete the crossover step is simply to choose two solutions obtained after the selection process. There is no requirement which two agents we choose, and as you can expect, these two solutions are referred to as parents.
Mutation
Mutation is a step in which we apply random changes to the new population of offspring.
We do this step because we want the genetic algorithm searching for possibilities other than the ones already found with the crossover step.
There are two types of mutation:
- Complete mutation: we create a new solution that is completely random
- Partial mutation: we take an existing offspring and we slightly mutate the genes
It's important to note that at each step of the genetic algorithm we keep the population size the same as we want the value to stay constant.
Genetic Algorithms: Python Implementation
Let's now look at an example of implementing a genetic algorithm in Python.
If we run the environment.py
file from the course, we see the game consists of a universe that we can add planets by clicking on the image.
The goal of the game is to find the shortest distance connecting all the planets, while only going to each planet once. You may have guessed that this is the famous Travelling Salesman Problem, which we will try and solve with a genetic algorithm.
We won't go over the environment.py
in this article, although here a few of the key points:
- We're using the
pygame
library to draw the game - The
Planet
class contains information about position - A few important parts of the
Environment
include our constants, adding the planets, loading the background and rocket, and a variable for our current position - We then have a methods to draw planets on the screen, edit them, update the environment, and reset it
Now let's focus on the implementing the genetic algorithm in Python.
Importing Libraries
In order to build our genetic algorithm we need to import NumPy, as well as our Environment
class
import numpy as np
from environment import Environment
Creating our Solutions
Now we're going to build the solutions that will make up the class. Every object of the class will contain information about the DNA and the fitness.
To solve the travelling salesman problem we're going to create a population where each solution contains a route, or a sequence of steps to complete the task.
For this class the DNA will be a sequence of moves, and our fitness score will be the total distance that the solution had to travel with that DNA.
# create the solutions
class Route():
def __init__(self, dnaLength):
self.dnaLength = dnaLength
self.dna = list()
self.distance = 0
Next we'll initialize the DNA randomly, which is a requirement of genetic algorithms since the agent is new (i.e. it isn't an offspring of other solutions).
To do this we're going write a for loop as follows:
# initialize random DNA
for i in range(self.dnaLength - 1):
rnd = np.random.randint(1, self.dnaLength)
while rnd in self.dna:
rnd = np.random.randint(1, self.dnaLength)
self.dna.append(rnd)
self.dna.append(0)
The next step is to build the Crossover method, which as a reminder is the process of taking the DNA from two parents and combine them up to create a new solution. The arguments for this method will just be the DNA of the two parents:
# creating the Crossover method
def mix(self, dna1, dna2):
self.dna = dna1.copy()
for i in range(self.dnaLength - 1):
if np.random.rand() <= 0.5:
previous = self.dna[i]
inx = self.dna.index(dna2[i])
self.dna[inx] = previous
self.dna[i] = dna2[i]
Now that we have our Crossover method let's create several methods for the Random Partial Mutation, which is the type of mutation that takes the existing offspring after the Crossover step and mutates its DNA slightly.
As mentioned, we do this in order to prevent offspring from being too similar to the parents, which can possibly give us better results. There will be two partial mutations, which will occur 10% of the time.
#Random Partial Mutations 1
for i in range(self.dnaLength - 1):
if np.random.rand() <= 0.1:
previous = self.dna[i]
rnd = np.random.randint(1, self.dnaLength)
inx = self.dna.index(rnd)
self.dna[inx] = previous
self.dna[i] = rnd
# Random Partial Mutation 2
elif np.random.rand() <= 0.1:
rnd = np.random.randint(1, self.dnaLength)
prevInx = self.dna.index(rnd)
self.dna.insert(i, rnd)
if i >= prevInx:
self.dna.pop(prevInx)
else:
self.dna.pop(prevInx + 1)
That will be all we need for the Route class, so now we can initialize the main code. We won't cover all of the code in this article, instead we'll just highlight a few of the main concepts:
- The population size will be 50
- The mutation rate will be 0.1
- The number of solutions selected at each generation in the selection process will be 5
If we run the code we see that can find the shortest route between planets in under 100 generations.
Summary: Genetic Algorithms
As discussed, genetic algorithms are a search heuristic inspired by the process of natural selection which are often comprised of the following core components:
- DNA
- Fitness function
- Population
- Selection
- Crossover
- Mutation
We use these processes in order to generate high-quality solutions to optimization and search problems by relying on biologically inspired concepts of evolution.
Resources:
This post may contain affiliate links. See our policy page for more information.