Combining MoCap and Gesture Recognition
MoCap Motion Capture is a very powerful interface technology, that allows the natural physical movements of a user to be translated directly into the natural movements of their avatar. However, it has definite limitations.
All versions of MoCap work in a similar way; the movement of either sensors or reflexive pads placed around the body of the user are read by various types of trackers. Sometimes the trackers are cameras, sometimes they are devices detecting radiowaves or electromagnetic changes. Regardless of whether radio ID tags, electromagnetic tags, or reflective pads are used, the user needs to be surrounded by the devices
Because body movements the very things being tracked occlude the sensors, or muddy the signal, the trackers have to be placed at every conceivable angle, to try and capture every movement. In the case of camera MoCap, the bare minimum system consists of eight cameras: Four above the user, in a square. Four below the user in a 45 degrees offset square.
Such a rig naturally takes up a great deal of room, and relegates MoCap interfaces to expensive, high-end VR environments, as opposed to home setups, and take a great deal of effort to assemble.
Worse, they're not very precise. Whereas general body movements are capturable, precise movements are extremely hard to differentiate. If it doesn't have a sensor pad, MoCap cannot track it. So, arm movements, swings of the hip, tossing the head back or doing the splits all these are captured. Twiddling the fingers, wriggling the toes, sticking out a tongue or wiggling the nose all of these are not.
Its not really possible to add more sensors either. Each has its own size to contend with. The illuminated or reflective pads are several centimetres across. Radio tags are a bit bulkier due to the electronics within them. If you put a sensor over every joint on the hands, it would all disintegrate into a single signal-producing mess.
Gesture recognition on the other hand, works off of a single camera, or a pair of cameras. If you wish to get all fancy, you can use four a stereoscopic pair somewhere in front, a stereoscopic pair somewhere behind. They track the actual movement of the scene, and attempt to work out from that, how the person has moved.
The advantage is they can make out even precise movements like finger waggling as changes to the lit pixels the cameras record. ~The disadvantage is, gesture recognition senses movement, so they don't do too well against a noisy background they tend to lose track of which movement is the user, and which movement is everybody else.
Both these two technologies have very definite advantages, and very different disadvantages. Where one has an advantage, the other has a disadvantage, and vice versa. If we could meld the two technologies together into one, we could end up with a system that has all the positive aspects of both.
There are many conceivable ways you could take the best of both technologies, and undoubtedly we will see many more failures than successes. One potential success of note, is the hybrid gesture recognition system developed by the Fraunhofer Institute for Manufacturing Engineering and Automation IPA in Stuttgart, Germany, which has been developed to control robotic devices.
Researchers have created a system which uses both sensors and machine vision to allow a human operator to control a mechanical arm regardless of how many joints that mechanical arm actually has. The user holds a single sensor in their hand, whilst the machine vision system takes stock of where the user is standing, and how their body moves. Specifically, how their thigh, hip and shoulder moves.
Since human arms have a known design structure, it is possible to work out via inverse kinematics, how the arm must have moved to reach a particular position, procvided you know the initial starting position and you have a sensor to track at the end of the arm. That's exactly what happens here. The algorithms are complex, but for the first time, doable in real-time. A prototype of the system was demonstrated at the Sensor+Test trade fair in June 2011.
"The input device contains various movement sensors, also called inertial sensors," says Bernhard Kleiner of the Fraunhofer Institute, who lead the project. The individual micro-electromechanical systems themselves are not expensive. What the scientists have spent time developing is how these sensors interact. "We have developed special algorithms that fuse the data of individual sensors and identify a pattern of movement. That means we can detect movements in free space."
It's still not able to work out fine movements such as finger wiggling, sign language, or handwriting gestures, but the approach demonstrates that with a literal lateral thinking, you can do away with the bulk and the considerable expense of a mocap rig.