Object Tracking Using Visual Filters
Biological brains excel at object tracking. Even very young infants can move their eyes and head to follow a finger or face moving in front of them. So imagine we aim our web camera at an object of interest and then move the object at random. Our task is to program our robot to move the camera to keep the object centered in the field of view.
Visual Filters and Blobs
Tracking an object implies that we can recognize at least one visual characteristic of the object from one video frame to the next. In computer vision, a good visual property to start with is color. A brightly colored ball or balloon will do nicely since it will stand out sharply against the other colors in the scene. So let's attempt to track a bright orange balloon as we move it about in front of our camera as shown in the image below:
To center the balloon in the camera's field of view, we first have to be able to locate it in the current view. To do this, we filter the current image by removing all pixels that don't match a certain level of the target color. What remains, we hope, are just those pixels that belong to our balloon. The process of filtering an image to highlight the object of interest goes beyond color. As we will see later, we can define filters even for complex objects such as the pattern of a human face. For this reason, visual object detection and recognition is often described in terms of finding the correct filter or filters for the task at hand.
Returning to our orange balloon, the image below shows the result of filtering the original image using RoboRealm's RGB filter set to match only pixels with a high level of red:
As you can see, the majority of the pixels are removed and all we are left with are two red pixel areas or blobs, one contained within the boundary of the balloon and the other reflected from a magazine lying on the floor. Note how the redness of the cat falls below the threshold we set for the RGB filter so that as far as the filter is concerned, the cat does not exist. This nicely illustrates a key point about perception: we tend to see what we are looking for. And while this may cause us to miss something important at times, it is the only way our visual system can selectively attend to some aspect of world while ignoring the rest.
To eliminate the magazine pixels and isolate just the balloon, we apply some additional filters using RoboRealm. First, we use the Erode filter which whittles away at each area of the image so that small pixel patches like the magazine tend to disappear altogether. Then we use the Dilate filter to bring back some of the pixels we lost on our bigger blobs. Finally, we use the Convex Hull filter to round out the border of the balloon and reduce its raggedness. The result looks like the following:
The balloon has been isolated as a fairly round white blob which we can now easily locate and track as it moves across the field of view. To get the balloon's coordinates, we use RoboRealm's Center of Gravity module which places a box around the pixels in the image and returns the coordinates of its center point. The result can then be superimposed on the original image as shown below:
RoboRealm has nicely isolated the orange balloon even with the cat all over it. The green diagonal line in the picture is the displacement of the balloon from the center of the frame and tells us how we need to move the camera to center the balloon in the field of view. When we translate this displacement into rotations of the pan and tilt servos of our robot's head and camera (details below), we get the behavior shown in the videos below. The first video shows the view from the robot camera:
The second video shows the view from an external camera that includes both the robot and the moving balloon:
While tracking of the balloon is relatively satisfactory in these videos, there is a noticeable lag between changes in the movement of the target and the response from the robot. There are a number of reasons for this. First, the camera used (a DLink 920) is operating wirelessly over 802.11g and there is an irreducible delay in getting the latest image frame back to the router and over to the main computer. Second, the tracking algorithm (detailed below) depends on there being a displacement between the target and the center of the image frame. For this reason, it is impossible to track the balloon in perfect sync since this would imply a zero displacement at all times. The only way this could happen would be for the robot to anticipate the movement of the balloon before it actually happens. Clearly animal and human brains are able to do just this under certain circumstances but it is outside the scope of this article. And finally, the frame rate of the video camera is also a limiting factor. In the videos shown here, the frame rate was 30 fps. Better results can be obtained when using a directly attached USB camera running at 90 fps.
Having said all this, we can improve the tracking speed by adjusting some parameters in the algorithm as explained below. Here are a couple of examples demonstrating some faster tracking including a number times the balloon is kicked into the air:
The .robo file used with RoboRealm to isolate the orange balloon can be found here: RoboRealm Track Orange Balloon.
Computing the Motion Commands
We will now look at the details of how we map the visual coordinates of the center of gravity (COG) of the orange blob into appropriate servo commands to move the head and camera. We start with the observation that the further the balloon is from the center of the image, the faster we need to move the camera since we have a greater distance to travel to re-center the target. So the servo speeds need to be proportional to the displacement of the COG from the center of the image. The view through the camera lens is illustrated in the diagram below:
Let's begin with the horizontal component of the COG displacement and the corresponding motion of the head's panning servo. A similar analysis would apply to vertical displacements and the servo that tilts the head. Let Fx be the horizontal field of view in degrees of our camera and let Rx be the horizontal resolution in pixels. (In the videos shown above, Fx is 61° and the resolution is 320x240 pixels so that Rx is 320.) Now suppose the COG of the balloon is currently displaced horizontally by Dx pixels from the center of the image. It is easier to work with this displacement in degrees which we can compute from (Dx / Rx) · Fx. To pan the head through that angular distance in T seconds, the required servo speed Sx in degrees per second is given by:
Sx = (Dx / Rx) · Fx / T
Since the values of Rx, Fx and T can be fixed for a given situation, we see that the servo rotation speed is simply proportional to the COG displacement from the center of the image:
Sx = kx · Dx
where kx = Fx / (Rx · T)
The final detail we are missing is how to command our servos to move with a particular rotational speed in degrees per second as specified by the above equation. If M is the maximum rotational speed of our servos, and I is the control value corresponding to that maximum speed, then the control signal C required to get the servo moving at speed S is given by:
C = I · S / M
Combining this with the previous equation, we have:
Cx = kx ' · Dx
where kx ' = kx · I / M
Let's now look at a concrete example. For the camera used in the videos above, the horizontal field of view Fx is 61 degrees and the horizontal resolution Rx is 320 pixels. Reading the manual for the Dynamixel AX-12+ servos, we find that the maximum speed M is 114 rpm or 684 degrees/second and the maximum control signal I is 1023. Finally, suppose we want the robot to move to the target's position in ¼ of a second (250ms). Then T = 0.25. Plugging these numbers in for kx' above, we find:
Cx = 1.14 · Dx
In other words, the control signal we send to our servo is simply 1.14 times the displacement of the balloon in pixels from the center of the image. Of course, as soon as either the camera or the balloon moves, the value of Dx changes and so must our control signal Cx. Fortunately, even an inexpensive desktop PC can execute this update at least 20 times per second (once every 50ms) so that the result is fairly smooth tracking as seen in the previous videos. A similar analysis for the vertical displacement of the target would show:
Cy = 1.12 · Dy
where we have used Fy = 45 and Ry = 240. Note that the multipliers in these two control equations are based on the assumption of a ¼ second reaction time. For faster tracking, try increasing these values. For example, to respond in 1/10 of a second (100ms), the equations would be Cx = 2.85 · Dx and Cy = 2.80 · Dy.
Programming the Object Tracking Thread
We are now ready to implement our tracking algorithm in code. As with all of our robot's behaviors, tracking will take place in its own thread. On each update cycle of the algorithm, we first query RoboRealm for the target's current horizontal and vertical coordinates in the visual field. These give us our Dx and Dy displacement values which we then plug into our control equations to give us the servo input values. The servos are then commanded to update their rotation speeds accordingly. The complete thread is shown below:
int)(kX * cogX);
int)(kY * cogY);
int)(Math.Abs(servoInputPan) + 1);
int)(Math.Abs(servoInputTilt) + 1);