Learning Obstacle Avoidance by Example
To help us introduce additional forms of learning in artificial neural networks, we are going to turn to a different robot task: obstacle avoidance. Few moments are as embarrassing as having your robot run into something during the middle of a demonstration, so obstacle avoidance is usually near the top of the list of "things to get right". There are many approaches to obstacle avoidance depending on the sensors your robot has, the specifics of its drive train (e.g. walking versus wheels) and even the size of the robot. But our goal in this section is to use a neural network to learn how to avoid obstacles.
Our robot is going to need some new sensors so that it can actually "see" the obstacles it is to avoid. A good place to start is to add some infrared (IR) and sonar range sensors. The picture below shows our setup:
The robot now has three IR sensors underneath the lower platform (circled in red) and four sonar sensors around mid height (circled in yellow). The particular IR sensors used here (Sharp GP2D12) have a maximum range of 31 inches while the sonar sensors (Ping) have a range of 133 inches (about 11 feet). It is often a good idea to use at least two different types of sensors when trying to measure something about the real world. The reason is that different sensors will have different "blind spots" as well as different ranges and resolution. For example, IR sensors tend to have a narrower beam than sonar which gives them better resolution but it also means that the beam can pass right through a small gap in an obstacle. Sonar is better at detecting objects with holes but it can fail to return an echo from a flat surface such as a wall when approaching it at a shallow enough angle. It can also fail to return an echo from softer surfaces such as a pant leg on a person. On the other hand, the longer range of sonar gives the robot a better chance of reacting to an obstacle before it is too late. Using both types of sensors gives us the best of both worlds.
The IR and sonar readings are used as the values of the input units in our artificial neural network. The input units are then fully cross connected to our two output units that control the robot's motor signals as usual. The resulting network looks like the following. (Only a few of the connections are labeled for clarity.)
As you can imagine, it would be a bit of a challenge to directly program this network's connections. In other words, it would be difficult to write down a set of if-then conditions mapping combinations of sonar and infrared readings into motor outputs. It is this kind of situation where neural networks can really show their strength. But the question now becomes: what is the best way for the network to learn a good set of connections? Standard supervised learning is not very practical as we'd have to intervene every time the robot ran into an object, then show it the correct maneuver for avoiding the collision. Learning by trial and error—i.e., unsupervised reinforcement learning—could be employed and will we return to that possibility later. However, for this occasion, we are going to try a form of guided learning instead.
Guided Learning
The idea behind guided learning is simple: we ask someone already expert in the skill we are trying to learn to control our movements for us while we simply relax and experience the sensations. One then hopes that your brain can associate your sensations with the movements so that you can better execute the actions yourself.
In the case of our robot, we will control its motion using a joystick while steering around obstacles. In the meantime, our robot will record the corresponding sensor readings and motor control signals and use them to train its neural network. The hope is that the resulting network connections will then allow our robot to avoid future obstacles on its own. Note that our goal is not to learn a specific path around a specific arrangement of obstacle; rather, we want our robot to learn a general skill for avoiding obstacles regardless of their position.
How should we use the recorded samples to train the network? Fortunately, we have already seen the answer since guided learning is really just a form of supervised exemplar learning in disguise. The difference is that we collect all the examples first, and then we train the network all at once, rather than training it one sample at a time. The process is often referred to as batch or offline learning. Since the number of recorded input-output samples could be very large, one might wonder about memory storage issues. However, with today's computers, the storage requirements for several minutes or even hours of guided training are not a problem. For example, if we sample our sensor readings and motor control signals five times per second and collect the data for five minutes, we will have to store an array of 7 x 5 x 60 x 5 = 10500 numbers. Assuming 1 byte per number, that amounts to only 10k bytes which is almost insignificant by today's standards. As it turns out, we will only need about 60 seconds worth of data anyway so our storage requirements are very small.
Robot Demonstration
The video below shows the robot under joystick control by the human operator. As you can see, the operator is careful to guide the robot close to obstacles without running into them which is the behavior we want our robot to learn to do on its own. Note also that we collect our data using a simplified obstacle placement—just one obstacle lies near the robot at a time. We do the same thing when we want to teach someone a new skill: isolate different aspects of the skill so that the learner can focus on one key element at a time. As it turns out, even though our robot is trained in a simplified environment, we will see that it can then apply what it has learned to avoid complicated obstacle arrangements.
The readings from the IR and sonar sensors as well as the two wheel speeds are sampled five times per second during recording. We only need about 60 seconds of such training to collect enough data. Next we use the recorded input-output samples to train the neural network controller using the most excellent AForget.NET neural network package which can be found at http://www.aforgenet.com. The process uses the same delta rule algorithm employed earlier (see details below), only this time we use offline batch learning to modify the connections. In this case it took only 10 passes through the data, also known as training epochs, before the network connections converged to their final values. And these 10 epochs took a total of only 3 milliseconds of computing time on a desktop PC.
Once learning is complete, we place our robot under the control of the network and set it loose among a collection of newly placed obstacles. The following video shows the entire sequence from recording, to training, to autonomous obstacle avoidance:
As you can see, the robot does remarkably well at avoiding obstacles using its neural network controller. This underscores the power of using neural networks to learn complicated input-output relations rather than trying to program all the possible if-then scenarios ourselves. All we had to do was guide the robot around a few obstacles for 60 seconds, train the network with the sampled data, then let it roam on its own. It is also worth bearing in mind that while seven sensors might seem like a lot of input, it also means that at any given moment of time, all the robot "sees" is seven numbers representing seven distance measurements. So while watch the video and can see with our eyes that the robot avoided "the wall" or went around "the ball", the robot will be lucky to get one or two numbers bouncing off these objects and on that sparse information has to make its decision to turn or not.
Viewing the Network Connections
After the network was trained with the recorded data, what were the resulting connection strengths between input and output units? For the demonstration shown above, the resulting connection matrix and biases had the following values, shown to two decimal places:
The first 2x7 matrix represents the connections between the seven input units and two output units while the second 2x1 matrix holds the two biases on the output units. The connections to the left motor are on the top row and the right motor connections on the bottom row, with order of inputs as follows: left IR, middle IR, right IR, left sonar, left-front sonar, right-front sonar and right sonar.
Another way to visualize the connections is to use the most excellent Matrix2PNG program from the Bioinformatics department at UBC. The program represents connection strengths with different colors as shown below:
In this image, green represents positive values, red negative and black represents numbers near zero. (Note that some of the darker greens and reds look almost black in the image.)
Let's look first at the six circles on the left of the image representing the connections between the three IR sensors and the two motors. We see that a reading on the left IR sensor activates the left motor and inhibits the right motor and vice versa for the right IR sensor, just as we would hope if we want the robot to turn away from obstacles. The middle IR sensor inhibits both motors when activated which means "slow down" if an obstacle is straight ahead.
Looking now at the connections for the sonar sensors, we see a similar pattern, though it is easier to see for the two front sonar sensors than for the two laterally pointing sensors. Both the left sonar sensors activate the left motor and inhibit the right motor or activate it less. The opposite pattern holds for the two right sonar sensors.
Finally, the last two connections are the biases and the positive values give us the "all clear" behavior of our robot—when all sensors are not detecting obstacles, the bias units drive both motors forward. Note that these values were not programmed in—they arose naturally through learning since our guided training included some driving straight ahead with no nearby obstacles.
Overall, the neural network has learned a set of appropriate connections for the task at hand. What's more, the network does not care about the particular arrangement of obstacles in its path—as the video above shows, even cul de sacs are handled smoothly as the robot turns away from the nearest wall or obstacle at any given moment.