Genetic Algorithm from scratch : Traveling salesman problem

Alibek Jakupov
May 26, 2019
4 min read

Updated: Nov 19, 2021

It is paradoxical, yet true, to say, that the more we know, the more ignorant we become in the absolute sense, for it is only through enlightenment that we become conscious of our limitations. Precisely one of the most gratifying results of intellectual evolution is the continuous opening up of new and greater prospects.

Nikola Tesla

Deep learning is great! Honestly, we can only admire what this technology has made with the whole industry within such a short period of time. There is no objective reason not to use Deep Learning for any task related to the classification. However, the classic usage of Deep Learning is limited by the usage of pre-built libraries in Python. However, what if we wanted to develop a simple but fully working algorithm from scratch and apply it to a concrete problem?

In this article we are going to implement such an algorithm and apply to the famous Traveling Salesman problem.

Up we go!

Briefly, Genetic Algorithms is a family of non-supervised machine learning algorithms that combine search and optimization techniques.

The main principle underlying these algorithm is quite simple:

Select The Best, Discard The Rest

Thus the goal is to simulate the process of natural selection by implementing the process of evolution of species.

You can see an excellent implementation of such techniques here.

And how to apply it to the traveling salesman problem?

First, what is Traveling Salesman problem?

Another quote from Wikipedia

The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in operations research and theoretical computer science. The travelling purchaser problem and the vehicle routing problem are both generalizations of TSP. In the theory of computational complexity, the decision version of the TSP (where, given a length L, the task is to decide whether the graph has any tour shorter than L) belongs to the class of NP-complete problems. Thus, it is possible that the worst-case running time for any algorithm for the TSP increases superpolynomially (but no more than exponentially) with the number of cities.

Then we need to get acquainted with the basic principle of genetic algorithm.

To sum up the list of steps is the following:

Initialize the population. For instance a set of cities (if we are talking about traveling salesman problem) with an initial (and not optimal order).
Evaluate solution. Do we achieve our goal of reducing the distance for our salesman?
Take decision : If the solution is optimal then stop, continue otherwise.
Apply selection (choose the best species)
Apply crossover (mix the species)
Apply mutation (change species parameters).

Encoding

First of all we need to encode our chromosome. Just like in real life our gene will control the set of characteristics of a species (its individuality) where each bit will represent a particual feature.

Fitness Function

A fitness function quantifies the optimality of a solution (chromosome) so that that particular solution may be ranked against all the other solutions.

The value of this function will actually demonstrate how close the current solution to the ideal one.

Recombination

The process that determines which solutions are to be preserved and allowed to reproduce and which ones deserve to die out.

The goal of this process is to indicate good solution, and finally carry out the principle we discussed earlier : Select the Bests, discard the Rest

Crossover

It is the process in which two chromosomes (strings) combine their genetic material (bits) to produce a new offspring which possesses both their characteristics.

Here we simply pick two encoded strings (species) and combine them by interchanging some corresponding bits in their genes (as in the image below).

Elitism

Elitism is a method which copies the best chromosome to the new offspring population before crossover and mutation.

We retain a certain number of the best species at each generation.

Mutation

It is the process by which a string is deliberately changed so as to maintain diversity in the population set.

It does not necessarily mean that mutation always results in a better result at a given generation. Sometimes mutations may be bad, like in real life. But mutation is crucial to make the overall solution work better.

Here we will implement a simple yet a fully functional version of this algorithm. Okay, but how we apply this algorithm to the traveling salesman problem.

Imagine our traveler has a list of cities he needs to visit :

Paris
Lyon
Lille
Nantes

We then simple represent these cities as a list of digits, like [1, 2, 3, 4, 5].

This string represents a path from Paris to Lyon, then to Lille, from Lille to Nantes and finally from Nantes back to Paris.

We have also a cost matrix representing the hours needed to get from one city to another.

So, the fittest species will the one requiring the least possible hours to visit all the cities.

We then pick two random digits (cities) in the strings (paths) and interchange their positions in the solution in order to give rise to a new tour.

The list of parameters we need to provide as input is a population size, maximum number of generations and mutation probability (to which extent do species mutate in each generation).

Here is the implementation.

from random import randint
from random import sample
 
class Chromosome:

 def __init__(self, sequence, cost):
 self.sequence = sequence
 self.cost = cost
 
 def getSequence(self):
 return self.sequence
 
 def getCost(self):
 return self.cost
 
 def setSequence(self, sequence):
 self.sequence = sequence

 def setCost(self, cost):
 self.cost = cost
 
 def getDcolumns(self):
 return self.dColumns     

def initialize (number_cities, population_size):
    generation = []
 for i in range(population_size):
        sequence = sample(xrange(1,number_cities+1), number_cities)
        generation.append(Chromosome(sequence, 0))
 return generation

def calculate_fitness(generation, cost_matrix):
 for chromosome in generation:
        cost = 0
 for i in range(len(chromosome.getSequence())-1):
            cost += cost_matrix[chromosome.getSequence()[i]-1][chromosome.getSequence()[i+1]-1]
        cost+= cost_matrix[chromosome.getSequence()[len(chromosome.getSequence())-1]-1][chromosome.getSequence()[0]-1]
        chromosome.setCost(cost)

def selection(generation):
 return sorted(generation, key = getKey)[0]

def crossover(parent1, parent2):
    point = randint(0, len(parent1.getSequence()))
    child = Chromosome(parent1.getSequence(), 0)
 #initializing child sequence
 #for x in range(0,len(child.getSequence())):
 #   child.getSequence()[x] = None
 
 # Two random integer indices of the parent1:
    start_pos = randint(0, len(parent1.getSequence()))
    end_pos = randint(0, len(parent1.getSequence()))


 # takes the sub-route from parent one and sticks it in itself:
 # if the start position is before the end:
 if start_pos < end_pos:
 # do it in the start-->end order
 for x in range(start_pos,end_pos):
            child.getSequence()[x] = parent1.getSequence()[x] # set the values to each other
 # if the start position is after the end:
 elif start_pos > end_pos:
 # do it in the end-->start order
 for i in range(end_pos,start_pos):
            child.getSequence()[i] = parent1.getSequence()[i] # set the values to each other

 # Cycles through the parent2. And fills in the child
 # cycles through length of parent2:
 for i in range(len(parent2.getSequence())):
 # if parent2 has a city that the child doesn't have yet:
 if not parent2.getSequence()[i] in child.getSequence():
 # it puts it in the first 'None' spot and breaks out of the loop.
 for x in range(len(child.getSequence()) - 1):
 if child.getSequence()[x] == None:
                    child.getSequence()[x] = parent2.getSequence()[i]
 break
 # repeated until all the cities are in the child route
 
 return child

def generate_next(generation, cost_matrix):
    temp = []
    result = []
    size = len(generation)
 while (len(temp)!=size):
        parent1 = generation[randint(0, size-1)]
        parent2 = generation[randint(0, size-1)]
        temp.append(crossover(parent1, parent2))
    result = temp + generation
    calculate_fitness(result, cost_matrix)
    result = sorted(result, key = getKey)
    result = result[:size]
 return result

def mutate (generation):
 for chromosome in generation:
        point1 = randint(0, len(chromosome.getSequence())-1)
        point2 = randint(0, len(chromosome.getSequence())-1)
        chromosome.getSequence()[point1], chromosome.getSequence()[point2] = chromosome.getSequence()[point2], chromosome.getSequence()[point1]
 
def printGeneration (generation):
 for chromosome in generation:
        print "{0}\tcost: {1}".format(chromosome.getSequence(), chromosome.getCost())
 
def getKey(chromosome):
 return chromosome.getCost()


cities = open('cities.txt', 'r')
cost_matrix = []

for row in cities:
    cost_matrix.append([ int(x) for x in row.split(',')])

number_cities = len(cost_matrix)
generation_size = 3
generation = initialize(number_cities, generation_size)
calculate_fitness(generation, cost_matrix)
next_generation = generation
print "Initial generation, Best sequence: {0}\tScore: {1}".format(selection(next_generation).getSequence(), selection(next_generation).getCost())
for i in range (1000):
    next_generation = generate_next(next_generation, cost_matrix)
    mutate(next_generation)
    calculate_fitness(next_generation, cost_matrix)
    print "Best sequence: {0}\tScore: {1}".format(selection(next_generation).getSequence(), selection(next_generation).getCost())

And as usual, link to github.

References

D. E. Goldberg, „Genetic Algorithm In Search, Optimization And Machine Learning‟, New York: Addison –Wesley (1989)
John H. Holland „Genetic Algorithms‟, Scientific American Journal, July 1992.
Kalyanmoy Deb, „An Introduction To Genetic Algorithms‟, Sadhana, Vol. 24 Parts 4 And 5.
T. Starkweather, et al, „A Comparison Of Genetic Sequencing Operators‟, International Conference On Gas (1991)
D. Whitley, et al , „Traveling Salesman And Sequence Scheduling: Quality Solutions Using Genetic Edge Recombination‟, Handbook Of Genetic Algorithms, New York