14 December 2005

Robots That Can't Draw

As Popular Science notes in the issue I recently received, in its typical glossy, hype filled, heavily illustrated way, one of the Army's major technological initiatives is to develop unmanned ground vehicles (UGVs) such as robotic trucks and tanks.

The main technological problem to doing this is not an inability to create a vehicle that can steer, change gears, control its speed and brake at the direction of a computer. Automatic transmissions and cruise controls are the stuff of ordinary consumer vehicles. The difference between an advanced anti-lock power braking system and a truly computer controlled one is modest. A computer controlled steering or braking system is technologically trivial, given the current state of engineering knowledge. In short, computer output is not the problem.

Computer input is also not really a problem. A few off the shelf videocameras connected to glorified cell phones are sufficient to provide a human operator with enough information to drive a truck or a tank remotely. (This isn't how robotic vehicles are designed, as I'll explain in a moment, but the videocameras do gather enough information to get the job done.)

The real problem is in the computer brain itself. Moreover, it isn't even really a problem with "higher level thinking". Anyone with a computer can go to programs like Map Quest, and get directions for efficiently going from point A to point B, i.e. turn left at 9th and Sheridan, drive two and a half blocks to Speer, turn right and follow Speer until you pass Colfax, etc. And, GPS software to tell a computer where it is now further enhances the process. It doesn't take much to modify the kind of algorithms used by Map Quest to take into account obstructions (or even the likelihood of obstructions), traffic, weather conditions, road grades, and the need to stop to refuel from time to time.

While it isn't consumer technology, the sort of targeting technology needed to have a weapon follow an identified target as it moves and to shoot it is already quite refined in naval and air applications and would translate well to ground vehicles as well. But, the principal impediment to this technology is not even friend or foe identification. It isn't hard to electronically tag friends (a system to do that exists now), in many situations the rules don't have to be complicated ("shoot all Russian model tanks"), and many UGV applications, like convoy and ambulance duties, don't require the vehicle to make those kinds of determinations at all.

The real impediment in the computer brain to an effective robotic vehicle is the problem of getting a computer that can process visual input in a manner that rivals the human brain, something that any four year old child can do effortlessly and without formal instruction. This "low level thinking" ability is crucial.

Why is this a problem?

It is a problem because digital cameras and computers that are attached to them think like artists do. They let you know what they actually see, rather than what they think that they see. Most people have a hard time doing realistic drawing because they don't know how to do that. A typical non-artist opens his eyes, and sees car, building, road and lamp post, and if asked to draw what they see tries to draw what they think a car, building, road or lamp post looks like. In contrast, an artist opens his eyes and sees colors, shapes, lines and forms, and then reproduces them as they actually appear, short circuiting the part of his brain that wants to tell them that a particular form is a building or a car or whatever. An artist may draw one continuous line that is first an outline of a building, and morphs into the outline of another building, and morphs into an outline of an adjacent tree. An artist may shade the entire path of a single shadow over several objects at once. An artist is buried in the trees and allows the forest to emerge as they are reproduced one by one. Artists don't necessarily have to approach a drawing in any particular sequence.

Robots are like these artists, but, unlike an artist, a robot can't get beyond the trees or parts of trees to see the forest. Robots can reproduce an exact and detailed picture of what they are seeing, but can't easily decompose that picture into cars, buildings, roads and lamp posts. Given input from a videocamera, they can't pick out the difference between an abstract design and an image of a real life scene with objects in it at real time speeds. The main reason unmanned ground vehicles are technologically problematic is that we do not yet have computers that can respond to images from a videocamera like a person who can't draw realistically. Until we come up with some combination of computer hardware power and clever software that can do what every four year old can do to process visual images, we have to use other approaches to help robots figure out the world around them (I suspect, but don't know, that existing computer hardware is powerful enough and that software is the main problem).

Ladar (a laser optical verson of sonar or radar), sonar, and other technologies based on finding the distance between a distant object and a sensor to create a three dimensional image directly without converting it from two dimensional input, have been the main alternative to visual processing that most autonomous robots use now, but these kinds of technologies are far more expensive than off the shelf videocameras and still have trouble figuring out where one object ends and another begins, or categorizing the objects that it sees into abstract categories like cars and buildings and roads and lamp posts quickly.

Come up with a robot that can't draw, and you will have solved the most important fundamental technological barrier to unmanned ground vehicles and a host of other robotics applications. We know it is possible. Kids can do it. But, we don't know how to do it.

No comments: