A Roadmap for Pi Robot

Last Updated: Feb 27, 2012

One of the hazards of working in robotics is that it encourages ADD--there are just so many areas to explore: navigation, sensor integration, visual perception, face and speech recognition, problem solving, learning and memory, language comprehension, social communication, and so on. One could literally spend a lifetime bouncing around from one topic to another. This is beginning to happen with the Pi Robot project, especially now that ROS makes it even easier to get distracted by the "latest cool thing." So I thought I'd draw up a roadmap of sorts to give us something a little more structured to follow.

From the beginning, the primary goal behind the Pi Robot Project has been to build a robot that can autonomously navigate around a typical household or office environment while interacting with people (and pets) and learn from experience. What we have discovered over the past five years is that it takes a lot of preliminary work to put this all together. The good news is that we have stabilized on a few key ingredients: ROS for the overall software framework, Dynamixel servos for joints, a Kinect for 3D vision, and a Hokoyu laser scanner for SLAM and obstacle avoidance (though one can do something similar with the Kinect alone). Pi's arms have also grown to six servos (degrees of freedom) each. It takes six degrees of freedom to specify the position and orientation of Pi's hand or an object in space, so six servos per arms makes the problem of reaching for such an object easier to solve.

But we still have lots of work to do on the basics. Here are the major sign posts we need to pass along the way to our goal. We'll use a check mark

icon to indicate what we have already got under control and an hour glass

icon to flag what we have yet to do:

1. Motor Control

Navigation, Path Planning and SLAM (Simultaneous Localization and Mapping)
- Topological navigation using semantic labels; e.g. "Go to the living room."
Pan and tilt servo control for Pi's camera.
- Head tracking and head pointing in 3D. (See pi_head_tracking_3d_part1 and pi_head_tracking_3d_part2.)
Arm Navigation
- Both forward and inverse kinematics have been tested using David Lu's arm_kinematics ROS package.
- Test the OpenRAVE kinematics package and compare results to above
- Program simple reaching tasks
- Incorporate collision avoidance while reaching (ROS arm_navigation stack)

2. Visual Feature Detection and Tracking

Tracking color features (CamShift)
Tracking "points of interests" using optical flow (Lucas-Kanade)
Face detection
Skeleton tracking (Kinect + openni_tracker ROS package)
Face detection followed by tracking. (See pi_face_tracker.)
Implement the TLD algorithm (Tracking-Learning-Detecting) by Zdenek Kalal

3. Visual Object Detection and Recognition

Recognizing feature patterns as object classes (e.g. "chair", "cat", "person")
Recognizing an object as a specific instance of a class (e.g. "person"=>"Joe")

4. Speech Recognition and Speech Synthesis (See pi_speech_tutorial)

Speech recognition using CMU's Pocket Sphinx (ROS stack rharmony). Recognize basic phrases such as "go forward" or "turn left".
Recognize more complex phrases such as "bring the blue ball".
Map recognized phrases into robot actions.
Find a suitable voice for Pi Robot using the Festival TTS package.
Implement the semantic frames using the RoboFrameNet ROS stack.

5. Simple Goal Directed Actions

Create a collection of ROS action servers and clients for preforming simple tasks such as "pick up the cup" or "turn to face Joe". ROS actions provide a mechanism for defining a goal, then setting the task in motion while feedback updates progress toward the goal, when it has been completed, timed out, or canceled.

6. Action Sequences and Task Planning

We will use the ROS SMACH package for executing a series of actions aimed a solving more complicated goals. SMACH stands for "State Machine" and allows the creation of a task hierarchy that defines how a particular task can be broken down into sub-tasks. For example, the high level task "bring me a beer" might be broken down into the following subtasks: "navigate to kitchen"=>"locate fridge"=>"grasp handle"=>"open door"=>"locate beer"=>"grasp beer"=>etc. SMACH allows us to set up this chain of events and then set it in motion, while the underlying library manages the contingencies between sub-tasks.
We will use the ROS Executive Teer stack for more complex task planning.

7. Learning and Memory

Psychologists define three primary kinds of memory: procedural, episodic, and semantic. Pi Robot will need all three, but especially the first two:
- Procedural Memory: Learning to play golf is a kind of procedural memory. At first you can't even hit the ball, but with some practice, your eye-hand coordination improves to the point where you might actually par one or two holes. Once we have Pi's arm kinematics worked out, the solutions we compute for, say, reaching for an object, will apply to an ideal situation where there is no "slop" in Pi's joints, which of course there is. We can therefore insert a neural network between Pi's vision system and his arm kinematics that will then learn to compensate for these imperfections.
- Episodic Memory: If asked to retrieve a particular object that lies somewhere in your house, it will benefit Pi to remember where he might have last seen it. This is an example of episodic memory. Another example would be for Pi to remember the various activities he performed yesterday or last week. For example, you might ask "Did you tidy the living room yesterday?" It is tempting to think that because a robot has a computer for a brain, we could simply store all the data that is comes in through its sensors. But video alone would fill even a large hard drive within a day or two. So we must be selective in what is stored.
- Semantic Memory: Knowing the capital of Kazakhstan is an example of semantic memory. The only way to know this is to have heard or read the fact at some point. Robot's can operate a little differently than people in this regard thanks to the Web and data structures called semantic networks. A semantic network connects a collection of facts or concepts by links that represent the relationships between them such as "birds lay eggs". A number of projects are well under way (e.g. ConceptNet) that enable a computer program (which can be run on your robot) to query these large semantic databases in a way similar to the way we access basic facts such as "What is the capital of Kazakhstan?" Answer: Astana.
Psychologists and machine learning experts also define many different forms of learning: supervised, reinforcement, guided, statistical, and observational.
- Supervised Learning: A good example of supervised learning is color naming; i.e. "this banana is yellow, but that one is green". For a robot to use color names the same way we do, it must be shown examples of different colors and told the correct label. (See for example the work of Kimberly Jameson.) One way to do this is to use an artificial neural network that takes color histograms as inputs and produces color names as outputs. With enough training using a human teacher, the network learns to categorize colors in a manner similar to people. I have done some preliminary work on this using a simple Perceptron neural network and it performs remarkably well.
- Reinforcement Learning: If you touch a hot stove for the first time, you will suffer pain, but then you will be unlikely to ever do it again. Conversely, if you choose a new route to work that gets you there 10 minutes faster, you'll likely choose that route again. Reinforcement learning requires an action followed by an outcome that can be scored positive or negative. Positive reinforcement increases the probability of repeating the action while negative reinforcement reduces it.
- Guided Learning: An example of guided learning takes place when a golf instructor moves your arm through an example of a good swing. The idea is that your brain will map the proprioceptive sensations into motor commands you can produce on your own. This kind of learning is fairly easy to do on a robot since we have continuous feedback from the servos regarding their current position, speed, torque, and even temperature.
- Statistical Learning: Statistical learning is closely related to data mining. Take for example the simple question: "How many bedrooms are there in a house?" Of course, there is no single answer. In any given house, there may be one bedroom or a hundred. But if we knew the number for every house, the number of bedrooms would form a distribution with a peak somewhere around 2. Now suppose you enter someone's house for the first time. How many bedrooms should you guess it has? Statistical learning theory tells us how we can make an informed guess from the underlying distribution, but since we don't know the underlying distribution exactly, we must estimate the distribution from our experience and then make our guess from that estimate. One of the more popular methods used in machine learning for performing these calculations is Bayesian Classification. Others involve simple clustering methods. We will have much to say and do with these method and other statistical learning techniques.
- Observational Learning and Imitation: Pi Robot's ability to mimic the arm movements of a person standing in front of him is an example of imitation. Imagine the possibilities this opens up for teaching a robot a particular task. Suppose you want Pi to stir something in a pot. Trying to program a stirring motion from scratch into Pi's various arm joints would be a difficult task. But if we simply let Pi watch us stir something, he can then mimic our actions and store them for future use. Observational learning can go one step further than imitation alone. For suppose moving my arm a certain way results in damage to my hand. In this case, we would *not* want Pi to imitate the action but rather, avoid the action he just observed.

8. Reasoning and Problem Solving

Reasoning and problem solving are often taken as the hallmark of human intelligence, but in fact many animal species are quite good at. Suppose we ask Pi Robot to retrieve an item that is blocked by another object. What would it take for Pi to figure out he first needs to move the first object out of the way? Computer programs that can solve problems have been around since the dawn of AI (see for example ACT-R) but most assume that the problems to be solved can be given a definite formal structure (like chess), which is often not the case in real-world situations. For example, if Pi can move the blocking object anywhere he likes, where should he move it? More recently, progress has been made on more general planning and scheduling systems as well as hierarchical task networks. Needless to say, this will be one of our more difficult challenges.

9. Executive Controller: What Should I Do Next?

Finally, overriding all of Pi's behavior must be some form of executive control. Why? The funny thing about a robot is that if you turn it on and don't tell it something to do, it will just sit there! (I actually do this myself sometimes...) Ordinarily, we give a robot something to do by running a specific program aimed at carrying out a particular task such as "navigate to the dining room" or "pick up the cup" or "mimic my arm motions". But if we want our robot to simply wander about the house and perform actions on the fly, we need a way for Pi to have a set of default behavioral goals that can nonetheless be interrupted by specific commands or events. A popular mechanism in robotics for achieving this scenario is called Subsumption. The idea is to set up a hierarchy of default behaviors based on their priority. For example, if Pi has nothing better to do, his default behavior might be "roam around the house and note anything out of the ordinary". At the same time, he could be streaming the video image from his camera to a web page so that you can monitor your home while you're away and send you alerts by email when something odd is detected. A behavior with higher priority than "roam" would be "recharge batteries if running low". Another would be "escape if stuck". In fact, since "roam" would be one of the lowest priority behaviors, almost anything else would preempt it such as "Pi, please find the TV remote". Fortunately, ROS has just the right mechanisms for implementing this executive controller; namely the SMACH and Executive Teer packages that we heard about in Section 6. So we shouldn't have too much work to do once we get to this point.