# Optimizing garbage collection strategy with genetic algorithm

Time：2021-6-9

By Andrew Kuo
Compile VK
Source: towards Data Science

Genetic algorithm is an optimization technology, which is similar to evolutionary process in essence. This may be a rough analogy, but if you squint at it, Darwin’s natural selection is really roughly similar to an optimization task, which aims to create organisms that are perfectly suited to reproduce in their environment.

In this article, I’ll show you how to implement a genetic algorithm in Python to “evolve” a garbage collection robot in a few hours. ### background

The best tutorial on the principle of genetic algorithm I met came from Melanie Mitchell’s Book Complexity: a guided tour.

In one chapter, Mitchell introduces a robot named Robby, whose only purpose in life is to pick up garbage, and describes how to use GA to optimize Robby’s control strategy. Next, I’ll explain my solution to this problem and show you how to implement the algorithm in Python. There are some good packages for constructing such algorithms (deap, for example), but in this tutorial, I’ll only use basic python, numpy, and tqdm (optional).

Although this is only an example of toys, gas is used in many practical applications. As a data scientist, I often use them for super parameter optimization and model selection. Although the computational cost of gas is very high, gas allows us to explore multiple regions of the search space in parallel, and is a good choice for gradient computation.

#### Problem description

A robot named Robby lives in a two-dimensional grid world full of garbage, surrounded by four walls (as shown in the figure below). The goal of this project is to develop an optimal control strategy so that he can effectively pick up garbage instead of hitting the wall. Robby can only see four blocks around him, up and down, left and right, and the blocks he is in. Each block has three choices: empty, garbage, or a wall. Therefore, Robby has 3 ⁵ = 243 different situations. Robby can perform seven different actions: move up, down, left and right (four), move randomly, pick up trash or stay still.

Therefore, Robby’s control strategy can be encoded as a “DNA” string, consisting of 243 digits between 0 and 6 (corresponding to the actions Robby should take in 243 possible situations).

#### method

The optimization steps of any GA are as follows:

1. The “population” of generating initial random solution of the problem

2. The individual’s “fit” is evaluated according to the degree to which it solves the problem

3. The most appropriate solution is to “breed” and pass on “genetic” material to the next generation

4. Repeat steps 2 and 3 until we have a set of optimized solutions

In our task, you created the first generation of robbys initialized to random DNA strings (corresponding to random control strategies). The simulation then runs these robots in a randomly assigned grid world and observes their performance.

#### Fit degree

The robot’s fitness depends on how much trash it picks up in n moves and how many times it hits the wall. In our example, the robot gives 10 points for every piece of garbage it picks up, and 5 points for every time it hits a wall. Then, these robots “mate” with the probability of their fitness (that is, robots that pick up a lot of garbage are more likely to reproduce), and a new generation of robots is born.

#### mating

There are several different ways to “mate.”. In Mitchell’s version, she randomly splices two strands of her parents’ DNA and then connects them to create a child for the next generation. In my implementation, I randomly assigned each gene from each parent (i.e., for each of 243 genes, I tossed a coin to decide whose gene to inherit).

For example, using my method, in the top 10 genes, the possible genes of parents and children are as follows:

``````Parent 1: 1440623161
Parent 2: 2430661132
Child:    2440621161``````

#### mutation

Another concept of natural selection that we replicate with this algorithm is mutation. Although most of a child’s genes are inherited from his parents, I have also established a small possibility of gene mutation (i.e. random allocation). This mutation rate enables us to explore new possibilities.

### Python implementation

The first step is to import the required package and set parameters for this task. I’ve chosen these parameters as a starting point, but they can be adjusted, and I encourage you to try.

``````"""
Import package
"""
import numpy as np
from tqdm.notebook import tqdm

"""
Setting parameters
"""
#Simulation settings
pop_ Size = 200 # number of robots per generation
num_ Breeders = 100 # number of robots that can mate per generation
num_ Gen = 400 # total algebra
iter_ per_ Sim = 100 # simulation times of garbage collection per robot
moves_ per_ ITER = 200 # the number of movements the robot can do each time

#Grid settings
rubbish_ Prob = 0.5 # the probability of garbage in each grid
grid_ Size = 10 # 0 grid size (except wall)

#Evolution settings
wall_ Penalty = - 5 ᦇ the fitting point deducted due to hitting the wall
no_ rub_ Penalty = - 1 ᦇ points deducted for picking up rubbish in empty square
rubbish_ Score = 10 ᦇ you can get points by picking up garbage
mutation_ Rate = 0.01 # probability of variation``````

Next, we define a class for the grid world environment. We use “O”, “X” and “W” to represent each cell, which corresponds to an empty cell, a cell with garbage and a wall.

``````class Environment:
"""
Class, used to represent a garbage filled grid environment. Each cell can be expressed as:
'o': empty
'x': rubbish
'W': Wall
"""
def __init__(self, p=rubbish_prob, g_size=grid_size):
Self. P = P # the probability that a cell is garbage
self.g_ size = g_ Size # excluding walls

#Initialize the grid and randomly allocate garbage
self.grid = np.random.choice(['o','x'], size=(self.g_size+2,self.g_size+2), p=(1 - self.p, self.p))

#Set the outer square as the wall
self.grid[:,[0,self.g_size+1]] = 'w'
self.grid[[0,self.g_size+1], :] = 'w'

def show_grid(self):
#Prints the grid in its current state
print(self.grid)

def remove_rubbish(self,i,j):
#Removes garbage from the specified cell (I, J)
If self. Grid [I, J] ='O ': # the cell is empty
return False
else:
self.grid[i,j] = 'o'
return True

def get_pos_string(self,i,j):
#Returns a string representing the "visible" cell of the robot in cell (I, J)
return self.grid[i-1,j] + self.grid[i,j+1] + self.grid[i+1,j] + self.grid[i,j-1] + self.grid[i,j]``````

Next, we create a class to represent our robot. This class includes methods for performing actions, calculating fit, and generating new DNA from a pair of parent robots.

``````class Robot:
"""
Used to represent a garbage collection robot
"""
def __init__(self, p1_dna=None, p2_dna=None, m_rate=mutation_rate, w_pen=wall_penalty, nr_pen=no_rub_penalty, r_score=rubbish_score):
self.m_ rate = m_ Rate # mutation rate
self.wall_ penalty = w_ Pen was punished for hitting the wall
self.no_ rub_ penalty = nr_ Penalty for picking up rubbish in an empty square
self.rubbish_ score = r_ Score # reward for picking up garbage
self.p1_ dna = p1_ DNA ා DNA of parent 2
self.p2_ dna = p2_ DNA ා DNA of parent 2

#Generate a dictionary to look up the gene index from the scene string
Con = ['w ','O','x '], wall, empty, garbage
self.situ_dict = dict()
count = 0
for up in con:
for right in con:
for down in con:
for left in con:
for pos in con:
self.situ_dict[up+right+down+left+pos] = count
count += 1

#Initialize DNA
self.get_dna()

def get_dna(self):
#Initialize DNA string of robot
if self.p1_dna is None:
#Random DNA generation without parents
self.dna = ''.join([str(x) for x in np.random.randint(7,size=243)])
else:
self.dna = self.mix_dna()

def mix_dna(self):
#Generate robot DNA from parents' DNA
mix_dna = ''.join([np.random.choice([self.p1_dna,self.p2_dna])[i] for i in range(243)])

for i in range(243):
if np.random.rand() > 1 - self.m_rate:
mix_dna = mix_dna[:i] + str(np.random.randint(7)) + mix_dna[i+1:]

return mix_dna

def simulate(self, n_iterations, n_moves, debug=False):
#Simulation garbage collection
tot_score = 0
for it in range(n_iterations):
Self. Score = 0 # fitting degree score
self.envir = Environment()
self.i, self.j = np.random.randint(1,self.envir.g_ Size + 1, size = 2) # random allocation of initial position
if debug:
print('before')
print('start position:',self.i, self.j)
self.envir.show_grid()
for move in range(n_moves):
self.act()
tot_score += self.score
if debug:
print('after')
print('end position:',self.i, self.j)
self.envir.show_grid()
print('score:',self.score)

def act(self):
#Perform actions based on DNA and robot position
post_ str = self.envir.get_ pos_ String (self. I, self. J) # robot current position
gene_ idx = self.situ_ dict[post_ STR] # relevant index of DNA in current position
act_ key = self.dna[gene_ IDX] # reading actions from DNA
if act_key == '5':
#Random movement
act_key = np.random.choice(['0','1','2','3'])

if act_key == '0':
self.mv_up()
elif act_key == '1':
self.mv_right()
elif act_key == '2':
self.mv_down()
elif act_key == '3':
self.mv_left()
elif act_key == '6':
self.pickup()

def mv_up(self):
#Move up
if self.i == 1:
self.score += self.wall_penalty
else:
self.i -= 1

def mv_right(self):
#Move right
if self.j == self.envir.g_size:
self.score += self.wall_penalty
else:
self.j += 1

def mv_down(self):
#Move down
if self.i == self.envir.g_size:
self.score += self.wall_penalty
else:
self.i += 1

def mv_left(self):
#Move left
if self.j == 1:
self.score += self.wall_penalty
else:
self.j -= 1

def pickup(self):
#Picking up rubbish
success = self.envir.remove_rubbish(self.i, self.j)
if success:
#Successfully picked up the garbage
self.score += self.rubbish_score
else:
#There is no garbage in the current box
self.score += self.no_rub_penalty``````

Finally, it’s time to run the genetic algorithm. In the following code, we generate an initial robot population and let natural selection run its process. I should mention that there are certainly faster ways to implement this algorithm (for example, using parallelization), but for the purpose of this tutorial, I sacrifice speed to achieve clarity.

``````#Initial population
pop = [Robot() for x in range(pop_size)]
results = []

#Executive evolution
for i in tqdm(range(num_gen)):
scores = np.zeros(pop_size)

#Traverse all robots
for idx, rob in enumerate(pop):
#Run the garbage collection simulation and calculate the fit
score = rob.simulate(iter_per_sim, moves_per_iter)
scores[idx] = score

Results. Append ([scores. Mean(), scores. Max()]) stores the average and maximum values of each generation

best_ Robot = pop [scores. Argmax()] # save the best robot

#Limit the number of robots that can mate
inds = np.argpartition(scores, -num_ breeders)[-num_ Breeders:] # index of top robot based on fitting degree
subpop = []
for idx in inds:
subpop.append(pop[idx])
scores = scores[inds]

#Square union standardization
norm_scores = (scores - scores.min()) ** 2
norm_scores = norm_scores / norm_scores.sum()

#Creating the next generation of robots
new_pop = []
for child in range(pop_size):
#Choose parents with good fit
p1, p2 = np.random.choice(subpop, p=norm_scores, size=2, replace=False)
new_pop.append(Robot(p1.dna, p2.dna))

pop = new_pop``````

Although at first most robots didn’t pick up garbage and always hit the wall, after several generations, we began to see some simple strategies (such as “if you are with garbage, pick it up” and “if you are next to the wall, don’t move it into the wall”). After hundreds of repetitions, we only have a generation of incredible garbage collection genius!

### result

The following chart shows that we can “evolve” a successful garbage collection strategy in the 400 generation robot population. In order to evaluate the quality of evolutionary control strategy, I manually create a benchmark strategy, which contains some intuitive and reasonable rules

• If the trash is in the current box, pick it up

• If you can see garbage on an adjacent block, move to that block

• If close to the wall, move in the opposite direction

• Otherwise, move freely

On average, this benchmark strategy achieves a fitting degree of 426.9, but the average fitting degree of our final “evolutionary” robot is 475.9.

### strategic analysis

The cool thing about this optimization approach is that you can find counterintuitive solutions. Robots can not only learn reasonable rules that humans may design, but also spontaneously come up with strategies that humans may never consider. An advanced technology has emerged, that is, the use of “markers” to overcome myopia and lack of memory.

For example, if a robot is now on a square with garbage and can see the garbage on the East and West squares, then a naive way is to immediately pick up the garbage on the current square and move to the square with garbage. The problem with this strategy is that once the robot moves (e.g. westward), it can’t remember that there is a garbage in the East. To overcome this problem, we observed our evolutionary robot perform the following steps:

1. Move westward (leave trash as a mark in the current box)

2. Pick up the trash and go East (it can see the trash as a sign)

3. Pick up the garbage and move to the East

4. Pick up the last piece of rubbish Another example of counterintuitive strategy generated from this optimization is shown below. Openai uses reinforcement learning (a more complex optimization method) to teach agents to play hide and seek. We see that these agents learn “human” strategies at first, but eventually learn new solutions.

### conclusion

Genetic algorithm combines biology and computer science in a unique way. Although it is not necessarily the fastest algorithm, in my opinion, it is one of the most beautiful algorithms.

All of the code described in this article can be found on my GitHub, as well as a demo Notebook:https://github.com/andrewjkuo/robby-robot-genetic-algorithm . Thank you for reading!

Welcome to panchuang AI blog:
http://panchuang.net/

Sklearn machine learning official Chinese document:
http://sklearn123.com/

Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/

## What is “hybrid cloud”?

In this paper, we define the concept of “hybrid cloud”, explain four different cloud deployment models of hybrid cloud, and deeply analyze the industrial trend of hybrid cloud through a series of data and charts. 01 introduction Hybrid cloud is a computing environment that integrates multiple platforms and data centers. Generally speaking, hybrid cloud is […]