Optimizing garbage collection strategy with genetic algorithm


By Andrew Kuo
Compile VK
Source: towards Data Science

Genetic algorithm is an optimization technology, which is similar to evolutionary process in essence. This may be a rough analogy, but if you squint at it, Darwin’s natural selection is really roughly similar to an optimization task, which aims to create organisms that are perfectly suited to reproduce in their environment.

In this article, I’ll show you how to implement a genetic algorithm in Python to “evolve” a garbage collection robot in a few hours.


The best tutorial on the principle of genetic algorithm I met came from Melanie Mitchell’s Book Complexity: a guided tour.

In one chapter, Mitchell introduces a robot named Robby, whose only purpose in life is to pick up garbage, and describes how to use GA to optimize Robby’s control strategy. Next, I’ll explain my solution to this problem and show you how to implement the algorithm in Python. There are some good packages for constructing such algorithms (deap, for example), but in this tutorial, I’ll only use basic python, numpy, and tqdm (optional).

Although this is only an example of toys, gas is used in many practical applications. As a data scientist, I often use them for super parameter optimization and model selection. Although the computational cost of gas is very high, gas allows us to explore multiple regions of the search space in parallel, and is a good choice for gradient computation.

Problem description

A robot named Robby lives in a two-dimensional grid world full of garbage, surrounded by four walls (as shown in the figure below). The goal of this project is to develop an optimal control strategy so that he can effectively pick up garbage instead of hitting the wall.

Robby can only see four blocks around him, up and down, left and right, and the blocks he is in. Each block has three choices: empty, garbage, or a wall. Therefore, Robby has 3 ⁵ = 243 different situations. Robby can perform seven different actions: move up, down, left and right (four), move randomly, pick up trash or stay still.

Therefore, Robby’s control strategy can be encoded as a “DNA” string, consisting of 243 digits between 0 and 6 (corresponding to the actions Robby should take in 243 possible situations).


The optimization steps of any GA are as follows:

  1. The “population” of generating initial random solution of the problem

  2. The individual’s “fit” is evaluated according to the degree to which it solves the problem

  3. The most appropriate solution is to “breed” and pass on “genetic” material to the next generation

  4. Repeat steps 2 and 3 until we have a set of optimized solutions

In our task, you created the first generation of robbys initialized to random DNA strings (corresponding to random control strategies). The simulation then runs these robots in a randomly assigned grid world and observes their performance.

Fit degree

The robot’s fitness depends on how much trash it picks up in n moves and how many times it hits the wall. In our example, the robot gives 10 points for every piece of garbage it picks up, and 5 points for every time it hits a wall. Then, these robots “mate” with the probability of their fitness (that is, robots that pick up a lot of garbage are more likely to reproduce), and a new generation of robots is born.


There are several different ways to “mate.”. In Mitchell’s version, she randomly splices two strands of her parents’ DNA and then connects them to create a child for the next generation. In my implementation, I randomly assigned each gene from each parent (i.e., for each of 243 genes, I tossed a coin to decide whose gene to inherit).

For example, using my method, in the top 10 genes, the possible genes of parents and children are as follows:

Parent 1: 1440623161
Parent 2: 2430661132
Child:    2440621161


Another concept of natural selection that we replicate with this algorithm is mutation. Although most of a child’s genes are inherited from his parents, I have also established a small possibility of gene mutation (i.e. random allocation). This mutation rate enables us to explore new possibilities.

Python implementation

The first step is to import the required package and set parameters for this task. I’ve chosen these parameters as a starting point, but they can be adjusted, and I encourage you to try.

Import package
import numpy as np
from tqdm.notebook import tqdm

Setting parameters
#Simulation settings
pop_ Size = 200 # number of robots per generation
num_ Breeders = 100 # number of robots that can mate per generation
num_ Gen = 400 # total algebra
iter_ per_ Sim = 100 # simulation times of garbage collection per robot
moves_ per_ ITER = 200 # the number of movements the robot can do each time

#Grid settings
rubbish_ Prob = 0.5 # the probability of garbage in each grid
grid_ Size = 10 # 0 grid size (except wall)

#Evolution settings
wall_ Penalty = - 5 ᦇ the fitting point deducted due to hitting the wall
no_ rub_ Penalty = - 1 ᦇ points deducted for picking up rubbish in empty square
rubbish_ Score = 10 ᦇ you can get points by picking up garbage
mutation_ Rate = 0.01 # probability of variation

Next, we define a class for the grid world environment. We use “O”, “X” and “W” to represent each cell, which corresponds to an empty cell, a cell with garbage and a wall.

class Environment:
    Class, used to represent a garbage filled grid environment. Each cell can be expressed as:
    'o': empty
    'x': rubbish
    'W': Wall
    def __init__(self, p=rubbish_prob, g_size=grid_size):
        Self. P = P # the probability that a cell is garbage
        self.g_ size = g_ Size # excluding walls

        #Initialize the grid and randomly allocate garbage
        self.grid = np.random.choice(['o','x'], size=(self.g_size+2,self.g_size+2), p=(1 - self.p, self.p))
        #Set the outer square as the wall
        self.grid[:,[0,self.g_size+1]] = 'w'
        self.grid[[0,self.g_size+1], :] = 'w'

    def show_grid(self):
        #Prints the grid in its current state

    def remove_rubbish(self,i,j):
        #Removes garbage from the specified cell (I, J)
        If self. Grid [I, J] ='O ': # the cell is empty
            return False
            self.grid[i,j] = 'o'
            return True

    def get_pos_string(self,i,j):
        #Returns a string representing the "visible" cell of the robot in cell (I, J)
        return self.grid[i-1,j] + self.grid[i,j+1] + self.grid[i+1,j] + self.grid[i,j-1] + self.grid[i,j]

Next, we create a class to represent our robot. This class includes methods for performing actions, calculating fit, and generating new DNA from a pair of parent robots.

class Robot:
    Used to represent a garbage collection robot
    def __init__(self, p1_dna=None, p2_dna=None, m_rate=mutation_rate, w_pen=wall_penalty, nr_pen=no_rub_penalty, r_score=rubbish_score):
        self.m_ rate = m_ Rate # mutation rate
        self.wall_ penalty = w_ Pen was punished for hitting the wall
        self.no_ rub_ penalty = nr_ Penalty for picking up rubbish in an empty square
        self.rubbish_ score = r_ Score # reward for picking up garbage
        self.p1_ dna = p1_ DNA ා DNA of parent 2
        self.p2_ dna = p2_ DNA ා DNA of parent 2
        #Generate a dictionary to look up the gene index from the scene string
        Con = ['w ','O','x '], wall, empty, garbage
        self.situ_dict = dict()
        count = 0
        for up in con:
            for right in con:
                for down in con:
                    for left in con:
                        for pos in con:
                            self.situ_dict[up+right+down+left+pos] = count
                            count += 1
        #Initialize DNA

    def get_dna(self):
        #Initialize DNA string of robot
        if self.p1_dna is None:
            #Random DNA generation without parents
            self.dna = ''.join([str(x) for x in np.random.randint(7,size=243)])
            self.dna = self.mix_dna()

    def mix_dna(self):
        #Generate robot DNA from parents' DNA
        mix_dna = ''.join([np.random.choice([self.p1_dna,self.p2_dna])[i] for i in range(243)])

        #Add variation
        for i in range(243):
            if np.random.rand() > 1 - self.m_rate:
                mix_dna = mix_dna[:i] + str(np.random.randint(7)) + mix_dna[i+1:]

        return mix_dna

    def simulate(self, n_iterations, n_moves, debug=False):
        #Simulation garbage collection
        tot_score = 0
        for it in range(n_iterations):
            Self. Score = 0 # fitting degree score
            self.envir = Environment()
            self.i, self.j = np.random.randint(1,self.envir.g_ Size + 1, size = 2) # random allocation of initial position
            if debug:
                print('start position:',self.i, self.j)
            for move in range(n_moves):
            tot_score += self.score
            if debug:
                print('end position:',self.i, self.j)
        return tot_ score / n_ Average score of iterations

    def act(self):
        #Perform actions based on DNA and robot position
        post_ str = self.envir.get_ pos_ String (self. I, self. J) # robot current position
        gene_ idx = self.situ_ dict[post_ STR] # relevant index of DNA in current position
        act_ key = self.dna[gene_ IDX] # reading actions from DNA
        if act_key == '5':
            #Random movement
            act_key = np.random.choice(['0','1','2','3'])

        if act_key == '0':
        elif act_key == '1':
        elif act_key == '2':
        elif act_key == '3':
        elif act_key == '6':

    def mv_up(self):
        #Move up
        if self.i == 1:
            self.score += self.wall_penalty
            self.i -= 1

    def mv_right(self):
        #Move right
        if self.j == self.envir.g_size:
            self.score += self.wall_penalty
            self.j += 1

    def mv_down(self):
        #Move down
        if self.i == self.envir.g_size:
            self.score += self.wall_penalty
            self.i += 1

    def mv_left(self):
        #Move left
        if self.j == 1:
            self.score += self.wall_penalty
            self.j -= 1

    def pickup(self):
        #Picking up rubbish
        success = self.envir.remove_rubbish(self.i, self.j)
        if success:
            #Successfully picked up the garbage
            self.score += self.rubbish_score
            #There is no garbage in the current box
            self.score += self.no_rub_penalty

Finally, it’s time to run the genetic algorithm. In the following code, we generate an initial robot population and let natural selection run its process. I should mention that there are certainly faster ways to implement this algorithm (for example, using parallelization), but for the purpose of this tutorial, I sacrifice speed to achieve clarity.

#Initial population
pop = [Robot() for x in range(pop_size)]
results = []

#Executive evolution
for i in tqdm(range(num_gen)):
    scores = np.zeros(pop_size)
    #Traverse all robots
    for idx, rob in enumerate(pop):
        #Run the garbage collection simulation and calculate the fit
        score = rob.simulate(iter_per_sim, moves_per_iter)
        scores[idx] = score

    Results. Append ([scores. Mean(), scores. Max()]) stores the average and maximum values of each generation

    best_ Robot = pop [scores. Argmax()] # save the best robot

    #Limit the number of robots that can mate
    inds = np.argpartition(scores, -num_ breeders)[-num_ Breeders:] # index of top robot based on fitting degree
    subpop = []
    for idx in inds:
    scores = scores[inds]

    #Square union standardization
    norm_scores = (scores - scores.min()) ** 2 
    norm_scores = norm_scores / norm_scores.sum()

    #Creating the next generation of robots
    new_pop = []
    for child in range(pop_size):
        #Choose parents with good fit
        p1, p2 = np.random.choice(subpop, p=norm_scores, size=2, replace=False)
        new_pop.append(Robot(p1.dna, p2.dna))

    pop = new_pop

Although at first most robots didn’t pick up garbage and always hit the wall, after several generations, we began to see some simple strategies (such as “if you are with garbage, pick it up” and “if you are next to the wall, don’t move it into the wall”). After hundreds of repetitions, we only have a generation of incredible garbage collection genius!


The following chart shows that we can “evolve” a successful garbage collection strategy in the 400 generation robot population.

In order to evaluate the quality of evolutionary control strategy, I manually create a benchmark strategy, which contains some intuitive and reasonable rules

  • If the trash is in the current box, pick it up

  • If you can see garbage on an adjacent block, move to that block

  • If close to the wall, move in the opposite direction

  • Otherwise, move freely

On average, this benchmark strategy achieves a fitting degree of 426.9, but the average fitting degree of our final “evolutionary” robot is 475.9.

strategic analysis

The cool thing about this optimization approach is that you can find counterintuitive solutions. Robots can not only learn reasonable rules that humans may design, but also spontaneously come up with strategies that humans may never consider. An advanced technology has emerged, that is, the use of “markers” to overcome myopia and lack of memory.

For example, if a robot is now on a square with garbage and can see the garbage on the East and West squares, then a naive way is to immediately pick up the garbage on the current square and move to the square with garbage. The problem with this strategy is that once the robot moves (e.g. westward), it can’t remember that there is a garbage in the East. To overcome this problem, we observed our evolutionary robot perform the following steps:

  1. Move westward (leave trash as a mark in the current box)

  2. Pick up the trash and go East (it can see the trash as a sign)

  3. Pick up the garbage and move to the East

  4. Pick up the last piece of rubbish

Another example of counterintuitive strategy generated from this optimization is shown below. Openai uses reinforcement learning (a more complex optimization method) to teach agents to play hide and seek. We see that these agents learn “human” strategies at first, but eventually learn new solutions.


Genetic algorithm combines biology and computer science in a unique way. Although it is not necessarily the fastest algorithm, in my opinion, it is one of the most beautiful algorithms.

All of the code described in this article can be found on my GitHub, as well as a demo Notebook:https://github.com/andrewjkuo/robby-robot-genetic-algorithm . Thank you for reading!

Link to the original text:https://towardsdatascience.com/optimising-a-rubbish-collection-strategy-with-genetic-algorithms-ccf1f4d56c4f

Welcome to panchuang AI blog:

Sklearn machine learning official Chinese document:

Welcome to pancreato blog Resource Hub: