This article is a very interesting project I saw last year. I tried to imitate its code and write a similar project, but it has not been completed. Here is a related blog translated by the original author. Maybe more people are interested in it.
Original address: creations avoiding plans
Original author’s Twitter address: hardmaru’s Twitter
Another experiment of the original author: the experiment of generating pseudo Chinese characters
Using neural network to realize biology that can avoid obstacles autonomously
Note: the original quoted some links outside the wall, please cooperate with the ladder
The following is the Translation:
Agent with neural network brain evolves for survival
Recently, I saw a simulation video showing the use of evolutionary technology to train agents to avoid moving obstacles. The method used here seems to be a variant of the near algorithm. Neat algorithm is used in the evolution of neural network topology, so that it can correctly complete specific tasks. It was written for unity 5 game engine, and it wanted to be integrated into a comprehensive game AI, which aroused my interest.
Simple perceptron diagram (Wikipedia)
Because agents can move freely in any direction at the moment of detecting obstacles, and there is no interaction between agents, this task becomes too simple, and the killing power of using near algorithm is too large. We use a simple perceptron like network (single neuron network) to make the agent effectively complete the task of avoiding obstacles.
Make agent’s life difficult, because life is difficult
I wanted to make things interesting, so I added a few extra conditions to make the task more difficult.
Motion is limited to a rigorous version of the Reynolds turn, as outlined in this paper to create realistic simulation behavior, which corresponds to the freedom to move in any direction.
Agent can choose to turn left or right, but can’t go straight. Just as a car without power turns, it has to go in the direction it turns. I want to see if the agent can develop a steering pattern that moves forward by quickly switching between left and right turns. This makes things more difficult. You can imagine driving a car, in order to go straight ahead, you have to keep killing the steering wheel to the left or right.
Agents will collide with each other. If the fitness function is set in this way, they may need to develop a strategy to avoid each other, otherwise my evil side will want them to run to the board together.
I found that with these limitations, the agent can not rely on a single neuron to complete this task. I still don’t think this task should rely on advanced methods such as ESP or near. I want to use the ready-made convnet.js library to generate neural network, so I chose the same super simple neural network structure used in the neural slim volleyball project. Here’s my first try.
Each agent’s brain consists of ten neurons, and each neuron is connected to all the inputs and the rest of the neurons, perhaps just like a slice of a small area in the real brain. Here’s a schematic of the brain:
The basic diagram of neural network — each brain has 269 weight values
This network structure is called fully connected recurrent network. Each agent has a set of perceptual inputs that will be sent into the recursive network. Two of the 10 neurons will control the action of the agent, and the others are “hidden neuron units” for calculation and thinking. All the output signals of neurons will be sent back to the input layer, so each neuron is completely connected with other neurons, and there will be a frame (~ 1 / 30s) time delay in between.
The first neuron in the output layer is used to control whether the agent turns left or right, depending on the output signal. We use hyperbolic tangent function to start neurons, because the output value of neurons is between – 1 and 1, so it is natural to make such a binary selection (left / right). The second neuron controls whether the agent moves or remains stationary, depending on the output signal.
100 agents initialized with random weight values are used for training, and they will run around in the simulation until they are killed by a board. At the end of the simulation, everyone dies. It’s a bad feeling, but people die
When all agents die in one simulation, we keep the genes of the 30 agents that have survived for the longest time in the simulation for the next generation, and discard the remaining 70 agents. The best 30 genes, namely the weight and bias vectors of neural connections, will be cross mutated to pass on to the next generation of 70 new agents. This process (which will be explained in the breeding section below) will continue until the best agents are able to skillfully evade the board and survive for more than five minutes.
The following is a running evolutionary simulation demo. Its first generation is a completely random network (I reduced this demo to 50 agents instead of 100, so that it can run on most computers and even smartphones). At first, most agents don’t try to survive (or give up!), But maybe one or two know what to do and do better, so they can get into the next generation and reproduce. After a few generations, you’ll see that the ability of the population increases, and they all start to avoid moving boards. I was surprised to find that 30 generations later, the results were good. If you’re interested, take a look at how they evolved from not knowing how to survive to this point in this training demo. Note that these results are updated every ten generations. Because I want them to evolve as fast as possible, instead of limiting evolution to 30 frames per second, run the simulation as fast as the computer can, and run it every ten generations as a demonstration simulation, so you can evaluate the progress in real time – and the actual evolution is at the speed of light behind the screen.
The sketch of training — how agent evolves from ignorance
The lines around an agent are its perceptual sensors. Just like visual input, they provide each agent with the ability to perceive the surrounding world. The closer the object perceived by agent is, the stronger the line is. Each agent can see eight directions around, and the detection distance is up to 12 times of its body radius. This is how I determine the signal value. The closer the object is detected by the agent, the closer the signal is to one. On the contrary, if no object is detected, the signal is zero. If the distance of the object seen by the agent is 6 times the radius, the signal is 0.25 instead of 0.5. Here I square the result to imitate the light density model.
What if Wayne Gretzky had a car accident before she had children?
Wayne Gretzky is a famous Canadian ice hockey player
I also made a little fine-tuning to the training algorithm mentioned above.
A problem now is that some of the best performing agents may accidentally die, even if they are very powerful, but because of the congestion between agents in the simulation, they will also die because of pure bad luck. One of the top agents of the previous generation, which started at the edge of the crowd, would be accidentally pushed onto the board by several other agents at the beginning of the simulation, which would not pass on its good genes to the next generation.
I thought about this problem and came up with an idea to add a hall of fame section to the evolutionary algorithm, which will record the genes of those agents who have achieved the best record score. I’ve also modified my evolution library to add this hall of fame feature to keep the top 10 champions of all time. For each simulation, the hall of fame will be added to the current generation. In this sense, every generation of agent must compete with the best agent ever. Imagine training with Wayne gretzkey, Doug Gilmour and Tim Horton every hockey season!
This technology has greatly improved the outcome of evolution, because the best genes are forced into the population. If this happens in real life, it will be terrifying (though we may soon reach this point…). I think I found a great new technology in the world of evolutionary computing, which is a breakthrough! However, I found Hall of fame technology has been used in the field of go for neural evolution. Well, that’s not good news for me.
I wanted to see if agents would behave differently if I re evolved, so I spent a few days training different groups of agents, each with about a thousand generations. It has been proved that most agents eventually have similar weight patterns. In order to know how the brain of each agent looks different, I draw the “gene code” of the best agent into a graph. For the effect in the graph, I combine four weights as one to form a color (four channels of RGBA).
Each row represents the weight of the most successful neural network
It can be seen that the best agents are almost the same except for some differences. However, this may lead to a problem that they may all be limited to a local optimization state. I can imagine that in some situations, agents tend to develop the same suboptimal strategy in order to avoid the board, and there will be no new progress due to the difficulty of the task. If we use more advanced methods, we can solve this problem, but I am satisfied with the recursive network generated by simple convolutional neural evolution. I used the best 23 genes in the demo. The final simulation will be initialized with Hall of fame randomly selected from the training group.
After about a thousand generations of training, agent has been able to avoid obstacles. Look at the final result in this demo. Agents are far from perfect, but the insect like behavior they show is very interesting to me. I’m surprised that even if they are limited to full left or right turns, they can still complete the task of avoiding the board. In the demo, each agent has a hand to show the direction they turn. No hand means still. I’m also happy to see that when agents want to move in one direction, they can swing forward in the blank area. It seems that they have developed a pattern by switching between left and right turns.
Final simulation: otoro.net/plans
I also noticed that they tend to get together with other agents and try to stay in the middle. I’m not sure why. Maybe it’s because in the middle of the crowd, it can wait for other agents to die, thus increasing its chances of survival. I’ve also considered using the fitness function with respect to the population averageRelative life lengthNot absolute life length. Perhaps in a more complex network, this can induce agents to become evil and plan to kill other agents.
Offspring with genes from both parents
Instead of waiting for all the agents to die and create the next generation every time, it’s better to make the demo more interesting, so I added the ability of pairing and generating offspring to the agent! In the future, this may form an alternative intra generational method for agent evolution, rather than fixing the population number of each generation to a constant.
Its working principle is that for each agent, if they can survive for a random time (generally 30 to 60 seconds), they will have the ability to reproduce. If such two agents meet, they will produce two offspring. The offspring are initially smaller and grow to adult size in about 30-60 seconds. When the “excitation clock” of the parent agent is reset to zero, it must wait another 30-60 seconds to generate the next generation
Crossover algorithm (Wikipedia)
The two offspring will inherit the genes of both parents through a very simple crossover method. So the first child will inherit the weights of the first neural network from the first parent, and the rest will come from the second parent. As described in the figure above, the truncation points are random. I also added random variations to each child, so they will be unique. This random variation is important to encourage innovation and to add new elements to the population’s gene pool.
In principle, the better the performance of the agent, the higher the chance of survival, so they have a higher chance of producing offspring carrying their genes, on the contrary, the weaker individuals will die before having children. Although sometimes good individuals die too early because of bad luck, it’s really not good for them – this is life – life is unfair, and life is hard!
Interaction with agent: playing God
When using demo, you can click the blank part of the screen to create a new agent to join the simulation. The initial gene of a new agent is randomly selected from several recent agents who are trying to produce offspring. If none of the agents has ever bred, it is selected from the original group of hall of fame.
If you click on the board, it will switch to the next visual mode, such as X-ray mode, borderless mode, pure biological mode without hands, etc. I use these patterns to evaluate the different designs and the overall aesthetic. I think X-ray mode looks cool and works faster on slower machines.
You can also give them a strong push by clicking on agents. This simulates a temporary wind effect, pushing them away from where they were clicked.
You can also drag animation lines on the screen to create a new board. Depending on the direction you draw, the board moves to the left or right in its vertical direction. You can see how the agent reacts when you release some boards to destroy their civilization. Please make good use of this new capability. I noticed that before the feature was not available, users would cheer for the kids, but after knowing this function, they began to prepare for genocide!
Design and development considerations
(this part has not been translated yet)