Genetic Algorithm from scratch : Traveling salesman problem
It is paradoxical, yet true, to say, that the more we know, the more ignorant we become in the absolute sense, for it is only through enlightenment that we become conscious of our limitations. Precisely one of the most gratifying results of intellectual evolution is the continuous opening up of new and greater prospects.
Deep learning is great! Honestly, we can only admire what this technology has made with the whole industry within such a short period of time. There is no objective reason not to use Deep Learning for any task related to the classification. However, the classic usage of Deep Learning is limited by the usage of pre-built libraries in Python. However, what if we wanted to develop a simple but fully working algorithm from scratch and apply it to a concrete problem?
In this article we are going to implement such an algorithm and apply to the famous Traveling Salesman problem.
Up we go!
Briefly, Genetic Algorithms is a family of non-supervised machine learning algorithms that combine search and optimization techniques.
The main principle underlying these algorithm is quite simple:
Select The Best, Discard The Rest
Thus the goal is to simulate the process of natural selection by implementing the process of evolution of species.
You can see an excellent implementation of such techniques here.
And how to apply it to the traveling salesman problem?
First, what is Traveling Salesman problem?
Another quote from Wikipedia
The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in operations research and theoretical computer science. The travelling purchaser problem and the vehicle routing problem are both generalizations of TSP. In the theory of computational complexity, the decision version of the TSP (where, given a length L, the task is to decide whether the graph has any tour shorter than L) belongs to the class of NP-complete problems. Thus, it is possible that the worst-case running time for any algorithm for the TSP increases superpolynomially (but no more than exponentially) with the number of cities.
Then we need to get acquainted with the basic principle of genetic algorithm.
To sum up the list of steps is the following:
Initialize the population. For instance a set of cities (if we are talking about traveling salesman problem) with an initial (and not optimal order).
Evaluate solution. Do we achieve our goal of reducing the distance for our salesman?
Take decision : If the solution is optimal then stop, continue otherwise.
Apply selection (choose the best species)
Apply crossover (mix the species)
Apply mutation (change species parameters).
First of all we need to encode our chromosome. Just like in real life our gene will control the set of characteristics of a species (its individuality) where each bit will represent a particual feature.
A fitness function quantifies the optimality of a solution (chromosome) so that that particular solution may be ranked against all the other solutions.
The value of this function will actually demonstrate how close the current solution to the ideal one.
The process that determines which solutions are to be preserved and allowed to reproduce and which ones deserve to die out.
The goal of this process is to indicate good solution, and finally carry out the principle we discussed earlier : Select the Bests, discard the Rest
It is the process in which two chromosomes (strings) combine their genetic material (bits) to produce a new offspring which possesses both their characteristics.
Here we simply pick two encoded strings (species) and combine them by interchanging some corresponding bits in their genes (as in the image below).
Elitism is a method which copies the best chromosome to the new offspring population before crossover and mutation.
We retain a certain number of the best species at each generation.
It is the process by which a string is deliberately changed so as to maintain diversity in the population set.
It does not necessarily mean that mutation always results in a better result at a given generation. Sometimes mutations may be bad, like in real life. But mutation is crucial to make the overall solution work better.
Here we will implement a simple yet a fully functional version of this algorithm. Okay, but how we apply this algorithm to the traveling salesman problem.
Imagine our traveler has a list of cities he needs to visit :
We then simple represent these cities as a list of digits, like [1, 2, 3, 4, 5].
This string represents a path from Paris to Lyon, then to Lille, from Lille to Nantes and finally from Nantes back to Paris.
We have also a cost matrix representing the hours needed to get from one city to another.
So, the fittest species will the one requiring the least possible hours to visit all the cities.
We then pick two random digits (cities) in the strings (paths) and interchange their positions in the solution in order to give rise to a new tour.
The list of parameters we need to provide as input is a population size, maximum number of generations and mutation probability (to which extent do species mutate in each generation).
Here is the implementation.
And as usual, link to github.
D. E. Goldberg, „Genetic Algorithm In Search, Optimization And Machine Learning‟, New York: Addison –Wesley (1989)
John H. Holland „Genetic Algorithms‟, Scientific American Journal, July 1992.
Kalyanmoy Deb, „An Introduction To Genetic Algorithms‟, Sadhana, Vol. 24 Parts 4 And 5.
T. Starkweather, et al, „A Comparison Of Genetic Sequencing Operators‟, International Conference On Gas (1991)
D. Whitley, et al , „Traveling Salesman And Sequence Scheduling: Quality Solutions Using Genetic Edge Recombination‟, Handbook Of Genetic Algorithms, New York