A few weeks ago, Pi Robot and yours truly joined the Silicon Valley ROS Users Group (SV-ROS) to help with the effort (already under way) to prepare for a challenging robot navigation contest held at the end of this year’s IROS Conference in Chicago. Having just spent the past year writing my second book on ROS, I was eager to get my hands dirty working with a real robot for a change.
On the surface, the challenge sounds rather easy–at least for a human. Given a cluttered cafeteria-style room as shown on the right, teams would have to program their robot to navigate autonomously to five specific locations all while avoiding tables, chairs, sofas, people and other objects. Each team would be given a limited time to first map the room with their robot and make note of the five locations. Each location was designated by a marker on the ceiling which would never be visible to the robot itself but could be used by the programmers during mapping to know when the robot was at a target location.
During the test run, the five location numbers would be given in a specific order to each team and the challenge was for the robot to move autonomously (i.e. without further control from the programmers) to each of these locations in the correct sequence and as quickly as possible without running into things or getting stuck. To make the task even more challenging (seemingly impossible when we first read about it), during the interval between the mapping and test phases, furniture and other objects could be moved, added or removed from the room. As if that weren’t enough, a number of people would be walking around the room during the test phase periodically crossing the robot’s path or even standing directly in its way for up to 30 seconds.
All teams would be given the same robot, a Pioneer 3-DX (shown on the left) from Adept Mobile Robots. The contest was co-sponsored by Microsoft who provided the Kinect for Windows depth camera located on the vertical post attached to the back of the robot and facing directly forward. The camera was the only sensor that would be available to the programmers. There were no sonar or IR sensors and no laser scanners. So all mapping, navigation, and obstacle avoidance would have to be based on vision alone.
To understand why this task seemed nearly impossible, it helps to keep in mind two important points: (a) the robot’s camera was only 18 inches (46 cm) off the ground and (b) the most widely used automated mapping technique (SLAM), typically relies on a laser scanner with a wide field of view (e.g. 180 degrees) while the Kinect has a visual field of view of only 45 degrees. So unlike a person’s view of the room, the robot would be looking at objects from roughly the same height as someone crawling on their knees with a forward facing cone tied to their head blocking all peripheral vision. And even if we did have a proper laser scanner, imagine what it would see when sweeping across the room: basically just a large number of narrow chair legs and table pedestals which could be moved around anyway before the testing phase.
Even so, we would at least have the entire RGB video stream from the Kinect camera as well as its depth data. So what would a human do with such data while moving around the room trying to remember the five target locations?
Psychologists and biologists have known for some time that people and animals use visual landmarks to help localize themselves in a given environment and fortunately for us, a number of robotics researchers have been working on visual mapping and localization (“Visual SLAM” or “RGBD-SLAM”) for a number of years now. The most promising of these efforts appears to be the work coming out of the IntRoLab at the Université de Sherbrooke in Quebec. In particular, Mathieu Labbé has developed a Real-Time-Appearance-Based Mapping algorithm () that stitches together visual features from an RGB-Depth camera like the Kinect and creates a truly amazing three-dimensional representation of rooms or other surroundings as shown in the video below:
Armed with RTAB-map, the SV-ROS team managed to borrow a Pioneer robot ahead of time and got to work programming all the control scripts, launch files, and configuration parameters needed to handle the requirements of the contest. In the meantime, Pi Robot had joined the party and was able to test the mapping procedure using his own Kinect camera. I also created a simulated cafeteria setting in Gazebo so that we could run a virtual mock-up of the contest over and over again with different navigation parameters to determine which settings resulted in the best performance. We stayed in contact via e-mail and were trying out new ideas right up to the last minute.
In the end, four of the team members (Greg Maxwell, Steve Okay, Ralph Gnauck and Girts Linde) flew to Chicago several days ahead of the event and fine tuned the robot’s behavior even further. They even had the good fortune to meet up with Mathieu Labbé who gave a paper on RTAB-map at the IROS conference. Of the six teams that entered the contest, only one other team made as many waypoints as our robot (3 out of 5 locations) but our robot did the course twice as fast so we came out on top. The fourth waypoint was at the end of a long featureless hallway that completely confused the vision-based localization methods used by RTAB-map and was not a situation that we anticipated or tested. At least we know better for next time!
Since writing this blog post, Mathieu Labbé has done a nice write-up of his own about the contest that you can find on his . You can also read the official press release from the SV-ROS user’s group on the ROS.org newsfeed.