By Andrew Kuo
Source: towards Data Science
Genetic algorithm is an optimization technology, which is similar to evolutionary process in essence. This may be a rough analogy, but if you squint at it, Darwin’s natural selection is really roughly similar to an optimization task, which aims to create organisms that are perfectly suited to reproduce in their environment.
In this article, I’ll show you how to implement a genetic algorithm in Python to “evolve” a garbage collection robot in a few hours.
The best tutorial on the principle of genetic algorithm I met came from Melanie Mitchell’s Book Complexity: a guided tour.
In one chapter, Mitchell introduces a robot named Robby, whose only purpose in life is to pick up garbage, and describes how to use GA to optimize Robby’s control strategy. Next, I’ll explain my solution to this problem and show you how to implement the algorithm in Python. There are some good packages for constructing such algorithms (deap, for example), but in this tutorial, I’ll only use basic python, numpy, and tqdm (optional).
Although this is only an example of toys, gas is used in many practical applications. As a data scientist, I often use them for super parameter optimization and model selection. Although the computational cost of gas is very high, gas allows us to explore multiple regions of the search space in parallel, and is a good choice for gradient computation.
A robot named Robby lives in a two-dimensional grid world full of garbage, surrounded by four walls (as shown in the figure below). The goal of this project is to develop an optimal control strategy so that he can effectively pick up garbage instead of hitting the wall.
Robby can only see four blocks around him, up and down, left and right, and the blocks he is in. Each block has three choices: empty, garbage, or a wall. Therefore, Robby has 3 ⁵ = 243 different situations. Robby can perform seven different actions: move up, down, left and right (four), move randomly, pick up trash or stay still.
Therefore, Robby’s control strategy can be encoded as a “DNA” string, consisting of 243 digits between 0 and 6 (corresponding to the actions Robby should take in 243 possible situations).
The optimization steps of any GA are as follows:
The “population” of generating initial random solution of the problem
The individual’s “fit” is evaluated according to the degree to which it solves the problem
The most appropriate solution is to “breed” and pass on “genetic” material to the next generation
Repeat steps 2 and 3 until we have a set of optimized solutions
In our task, you created the first generation of robbys initialized to random DNA strings (corresponding to random control strategies). The simulation then runs these robots in a randomly assigned grid world and observes their performance.
The robot’s fitness depends on how much trash it picks up in n moves and how many times it hits the wall. In our example, the robot gives 10 points for every piece of garbage it picks up, and 5 points for every time it hits a wall. Then, these robots “mate” with the probability of their fitness (that is, robots that pick up a lot of garbage are more likely to reproduce), and a new generation of robots is born.
There are several different ways to “mate.”. In Mitchell’s version, she randomly splices two strands of her parents’ DNA and then connects them to create a child for the next generation. In my implementation, I randomly assigned each gene from each parent (i.e., for each of 243 genes, I tossed a coin to decide whose gene to inherit).
For example, using my method, in the top 10 genes, the possible genes of parents and children are as follows:
Parent 1: 1440623161 Parent 2: 2430661132 Child: 2440621161
Another concept of natural selection that we replicate with this algorithm is mutation. Although most of a child’s genes are inherited from his parents, I have also established a small possibility of gene mutation (i.e. random allocation). This mutation rate enables us to explore new possibilities.
The first step is to import the required package and set parameters for this task. I’ve chosen these parameters as a starting point, but they can be adjusted, and I encourage you to try.
""" Import package """ import numpy as np from tqdm.notebook import tqdm """ Setting parameters """ #Simulation settings pop_ Size = 200 # number of robots per generation num_ Breeders = 100 # number of robots that can mate per generation num_ Gen = 400 # total algebra iter_ per_ Sim = 100 # simulation times of garbage collection per robot moves_ per_ ITER = 200 # the number of movements the robot can do each time #Grid settings rubbish_ Prob = 0.5 # the probability of garbage in each grid grid_ Size = 10 # 0 grid size (except wall) #Evolution settings wall_ Penalty = - 5 ᦇ the fitting point deducted due to hitting the wall no_ rub_ Penalty = - 1 ᦇ points deducted for picking up rubbish in empty square rubbish_ Score = 10 ᦇ you can get points by picking up garbage mutation_ Rate = 0.01 # probability of variation
Next, we define a class for the grid world environment. We use “O”, “X” and “W” to represent each cell, which corresponds to an empty cell, a cell with garbage and a wall.
class Environment: """ Class, used to represent a garbage filled grid environment. Each cell can be expressed as: 'o': empty 'x': rubbish 'W': Wall """ def __init__(self, p=rubbish_prob, g_size=grid_size): Self. P = P # the probability that a cell is garbage self.g_ size = g_ Size # excluding walls #Initialize the grid and randomly allocate garbage self.grid = np.random.choice(['o','x'], size=(self.g_size+2,self.g_size+2), p=(1 - self.p, self.p)) #Set the outer square as the wall self.grid[:,[0,self.g_size+1]] = 'w' self.grid[[0,self.g_size+1], :] = 'w' def show_grid(self): #Prints the grid in its current state print(self.grid) def remove_rubbish(self,i,j): #Removes garbage from the specified cell (I, J) If self. Grid [I, J] ='O ': # the cell is empty return False else: self.grid[i,j] = 'o' return True def get_pos_string(self,i,j): #Returns a string representing the "visible" cell of the robot in cell (I, J) return self.grid[i-1,j] + self.grid[i,j+1] + self.grid[i+1,j] + self.grid[i,j-1] + self.grid[i,j]
Next, we create a class to represent our robot. This class includes methods for performing actions, calculating fit, and generating new DNA from a pair of parent robots.
class Robot: """ Used to represent a garbage collection robot """ def __init__(self, p1_dna=None, p2_dna=None, m_rate=mutation_rate, w_pen=wall_penalty, nr_pen=no_rub_penalty, r_score=rubbish_score): self.m_ rate = m_ Rate # mutation rate self.wall_ penalty = w_ Pen was punished for hitting the wall self.no_ rub_ penalty = nr_ Penalty for picking up rubbish in an empty square self.rubbish_ score = r_ Score # reward for picking up garbage self.p1_ dna = p1_ DNA ා DNA of parent 2 self.p2_ dna = p2_ DNA ා DNA of parent 2 #Generate a dictionary to look up the gene index from the scene string Con = ['w ','O','x '], wall, empty, garbage self.situ_dict = dict() count = 0 for up in con: for right in con: for down in con: for left in con: for pos in con: self.situ_dict[up+right+down+left+pos] = count count += 1 #Initialize DNA self.get_dna() def get_dna(self): #Initialize DNA string of robot if self.p1_dna is None: #Random DNA generation without parents self.dna = ''.join([str(x) for x in np.random.randint(7,size=243)]) else: self.dna = self.mix_dna() def mix_dna(self): #Generate robot DNA from parents' DNA mix_dna = ''.join([np.random.choice([self.p1_dna,self.p2_dna])[i] for i in range(243)]) #Add variation for i in range(243): if np.random.rand() > 1 - self.m_rate: mix_dna = mix_dna[:i] + str(np.random.randint(7)) + mix_dna[i+1:] return mix_dna def simulate(self, n_iterations, n_moves, debug=False): #Simulation garbage collection tot_score = 0 for it in range(n_iterations): Self. Score = 0 # fitting degree score self.envir = Environment() self.i, self.j = np.random.randint(1,self.envir.g_ Size + 1, size = 2) # random allocation of initial position if debug: print('before') print('start position:',self.i, self.j) self.envir.show_grid() for move in range(n_moves): self.act() tot_score += self.score if debug: print('after') print('end position:',self.i, self.j) self.envir.show_grid() print('score:',self.score) return tot_ score / n_ Average score of iterations def act(self): #Perform actions based on DNA and robot position post_ str = self.envir.get_ pos_ String (self. I, self. J) # robot current position gene_ idx = self.situ_ dict[post_ STR] # relevant index of DNA in current position act_ key = self.dna[gene_ IDX] # reading actions from DNA if act_key == '5': #Random movement act_key = np.random.choice(['0','1','2','3']) if act_key == '0': self.mv_up() elif act_key == '1': self.mv_right() elif act_key == '2': self.mv_down() elif act_key == '3': self.mv_left() elif act_key == '6': self.pickup() def mv_up(self): #Move up if self.i == 1: self.score += self.wall_penalty else: self.i -= 1 def mv_right(self): #Move right if self.j == self.envir.g_size: self.score += self.wall_penalty else: self.j += 1 def mv_down(self): #Move down if self.i == self.envir.g_size: self.score += self.wall_penalty else: self.i += 1 def mv_left(self): #Move left if self.j == 1: self.score += self.wall_penalty else: self.j -= 1 def pickup(self): #Picking up rubbish success = self.envir.remove_rubbish(self.i, self.j) if success: #Successfully picked up the garbage self.score += self.rubbish_score else: #There is no garbage in the current box self.score += self.no_rub_penalty
Finally, it’s time to run the genetic algorithm. In the following code, we generate an initial robot population and let natural selection run its process. I should mention that there are certainly faster ways to implement this algorithm (for example, using parallelization), but for the purpose of this tutorial, I sacrifice speed to achieve clarity.
#Initial population pop = [Robot() for x in range(pop_size)] results =  #Executive evolution for i in tqdm(range(num_gen)): scores = np.zeros(pop_size) #Traverse all robots for idx, rob in enumerate(pop): #Run the garbage collection simulation and calculate the fit score = rob.simulate(iter_per_sim, moves_per_iter) scores[idx] = score Results. Append ([scores. Mean(), scores. Max()]) stores the average and maximum values of each generation best_ Robot = pop [scores. Argmax()] # save the best robot #Limit the number of robots that can mate inds = np.argpartition(scores, -num_ breeders)[-num_ Breeders:] # index of top robot based on fitting degree subpop =  for idx in inds: subpop.append(pop[idx]) scores = scores[inds] #Square union standardization norm_scores = (scores - scores.min()) ** 2 norm_scores = norm_scores / norm_scores.sum() #Creating the next generation of robots new_pop =  for child in range(pop_size): #Choose parents with good fit p1, p2 = np.random.choice(subpop, p=norm_scores, size=2, replace=False) new_pop.append(Robot(p1.dna, p2.dna)) pop = new_pop
Although at first most robots didn’t pick up garbage and always hit the wall, after several generations, we began to see some simple strategies (such as “if you are with garbage, pick it up” and “if you are next to the wall, don’t move it into the wall”). After hundreds of repetitions, we only have a generation of incredible garbage collection genius!
The following chart shows that we can “evolve” a successful garbage collection strategy in the 400 generation robot population.
In order to evaluate the quality of evolutionary control strategy, I manually create a benchmark strategy, which contains some intuitive and reasonable rules
If the trash is in the current box, pick it up
If you can see garbage on an adjacent block, move to that block
If close to the wall, move in the opposite direction
Otherwise, move freely
On average, this benchmark strategy achieves a fitting degree of 426.9, but the average fitting degree of our final “evolutionary” robot is 475.9.
The cool thing about this optimization approach is that you can find counterintuitive solutions. Robots can not only learn reasonable rules that humans may design, but also spontaneously come up with strategies that humans may never consider. An advanced technology has emerged, that is, the use of “markers” to overcome myopia and lack of memory.
For example, if a robot is now on a square with garbage and can see the garbage on the East and West squares, then a naive way is to immediately pick up the garbage on the current square and move to the square with garbage. The problem with this strategy is that once the robot moves (e.g. westward), it can’t remember that there is a garbage in the East. To overcome this problem, we observed our evolutionary robot perform the following steps:
Move westward (leave trash as a mark in the current box)
Pick up the trash and go East (it can see the trash as a sign)
Pick up the garbage and move to the East
Pick up the last piece of rubbish
Another example of counterintuitive strategy generated from this optimization is shown below. Openai uses reinforcement learning (a more complex optimization method) to teach agents to play hide and seek. We see that these agents learn “human” strategies at first, but eventually learn new solutions.
Genetic algorithm combines biology and computer science in a unique way. Although it is not necessarily the fastest algorithm, in my opinion, it is one of the most beautiful algorithms.
All of the code described in this article can be found on my GitHub, as well as a demo Notebook:https://github.com/andrewjkuo/robby-robot-genetic-algorithm . Thank you for reading!
Welcome to panchuang AI blog:
Sklearn machine learning official Chinese document:
Welcome to pancreato blog Resource Hub: