Not a member yet? Register for full benefits!

Waldo Avatars



Waldo is a slang term for teleoperated devices. It refers to any device or implement, which when controlled remotely, responds in the exact same manner.

The term itself, derives from the short story "Waldo" written in 1942 by Robert Heinlein. In the story, Waldo Farthingwaite-Jones was born a weakling, unable even to lift his head up to drink. This channeled his intellect, and his family's money, into the development of the device patented as "Waldo F. Jones' Synchronous Reduplicating Pantograph".

By means of a glove and harness, Waldo could control a powerful mechanical hand which replicated exactly, the movements of his own hand and fingers.


Traditionally, the term Avatar means a representation of a powerful being who exists above, or beyond the scope of the world. Deities and Demons are usually represented by avatars in accountings.

So it is for virtual worlds - a player is a power beyond the scope of the virtual world, and an avatar is the term for the graphical form they adopt within it.

Use of Avatars

The Avatar is your body within the virtual environment. It is, for all intents and purposes, you within that world. However, it is not a very animate you. With a few exceptions over the years, the avatar acts merely as a visage, conveying a sense of your presence, of your mind. They have not been able to convey your actions, your body language, your very sense of self.

Instead, it just stands there. Or sits there, or lays there. It might use pre-compiled gestures, but nothing more than that.

In order for a virtual environment to truly be an environment for us, the avatars we utilise have to be so much more than they are now. They have to become extensions of our will as well as of our selves, bodily appendages or recreations that enable fine movement, and the manifestation of physical movement or the will to move physically, recreated in the every detail, in the collaborative, virtual space.

Dynamic Sequenced Avatars

Deutsche Telekom (Germany's national Telecoms) have created a system which lets your virtual avatar and physical form work together as one. On display at both their Bonn headquarters, and their laboratories in Berlin, their prototype dynamic avatar system is not compatible with any pre-existing social virtual reality system, at this time.

Created in collaboration with the German Franhoser Heimrich-Heinz institute, and the Israeli Ben-Gurion University; the combined hope is to create an interaction conduit which corresponds precisely to physical bodily movements.

There is of course, still a long way to go. The system is based off of machine vision and gesture recognition technologies. As such, it fails entirely if the user is wearing any form of clothing, which obstructs the key areas of the skin - gloves or bandages over the hands, or anything, which covers the face.

The current concept is aimed squarely at preserving privacy: letting users interact with one another in an audio-visual environment, without revealing how they physically appear to each other.

Increasing realism and boosting interaction, the system attempts to interpret facial expressions, and the movement of lips in order to sync them to the user's speech and physical muscle movements. This of course means that if you are unfortunate to have a condition like, say, tourette's syndrome, the avatar will mover to every facial tic, to the best of its ability.

The one thing it won't do, is mimic every tic, or expression precisely. That is because it does not actually work that way.

The avatar is not actually reproducing everything you do, it is comparing what you are doing, to its database of known movements, and displaying the one that most closely matches.

The system can recognise a set of 66 different parameters that define every known possible facial expression, most of which are based on fine muscle movements. In addition to this, it has a set of four sequences, basically. Facial expressions it has standardised, and will play when it thinks that is what you are doing. These are joy, sadness, surprise, and disgust. Of course, the sequences can also be manually triggered at any time.

When it comes to syncing speech to lip movements, again, it does not do this to match what you are saying. Instead, it recognises visemes (visual themes) which move the lips in accordance with the phoneme being spoken based on voice analysis. A set of 15 visemes can represent all phonemes accurately enough to lip-read.

Finally, a set of 186 body motion parameters that define joint rotation in the arms and upper body are recognised, and used to calculate positions.

A side-benefit of this, is that if you jerk suddenly, the system likely will not replicate it, as it emphasises smooth body flow, not jerky motions. If you start jerking around, the avatar will flow as best it can, to make such movements smooth.

Whilst the upper body and limb movements are recreatable, down to hip swinging, fine movements such as fingers cause problems. The algorithms currently in use, can only detect finger movements as part of whole hand movement - they are fine enough to detect specific gestures, such as the American Sign Language alphabet encoded into the system, but the avatar controls cannot monitor free form finger movements and fine dexterity at this time.

A side effect is of course, that haptic capabilities are not yet possible through this interface, nor are personal level interactions between individuals, of such nature.

However, there is hope. Work currently in progress to improve the level of gesture recognition of fingers, without overtaxing the processing capabilities of a mid-range home PC, is ongoing, and combined with the continued raising of the bar for mid-range PCs, results fine enough to allow dextrous finger movements as is the case with the face, are anticipated within two to three years.

In the meantime, tests with the avatar system are scheduled to begin within the next six months, integrating it into other applications.

The first such, will be the creation of online call centres, specifically support systems, where hands-on-individuals, may not be best presentable for face to face communication, or where high levels of customer irateness makes a visual facade prudent, without limiting communication.

Post such initial tests, there are plans to integrate with social VR spaces. At least, those willing to adopt such software. SecondLife is a given, other platforms have interest levels at varying degrees.

A third approach being undertaken, is to integrate the avatar system into personal mobile computation and telecommunication devices, as an interface method.

Known Issues

The system is, of course, far from perfect, even within its scope as an interface method. A prime concern at this point, is its lack of compatibility with other interface methods, particularly those for disabled individuals, to whom a virtual visage is one of the greatest weapons to wield in the world of business.

If for example, they use a speech collar, such as is currently in development to override a damaged larynx, the visemes will not function to accompany spoken voice, as the lips are not involved in the transaction. Thus, the avatar lips remain silent even though the person is talking. This is but one example of the need to be modular - to be activated in many different ways, that this system is currently lacking, and will require, in order to truly push interfaces to the next level, the level they ought to be at.

Further Reading

Avatar Mimics You in Real Time

Staff Comments


Untitled Document .