Speech-Enabled Avatars

While a substantial amount of work has been done on developing human face avatars, we have yet to see avatars that are highly realistic in terms of animation as well as appearance. The goal of this work is to create speech-enabled avatars of faces that provide realistic facial motion from text or speech inputs. Such speech-enabled avatars can significantly enhance user experience in a variety of applications including mobile messaging, information kiosks, advertising, news reporting and videoconferencing.

In this project, we present a complete framework for creating speech enabled 2D and 3D avatars from a single image of a person. Our approach uses a generic facial motion model which represents deformations of the prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model is transformed to a novel face geometry using a set of corresponding points between the generic mesh and the novel face. In the case of a 2D avatar, a single photograph of the person is used as input. We manually select a small number of features on the photograph and these are used to deform the prototype surface. The deformed surface is then used to animate the photograph. In the case of a 3D avatar, we use a single stereo image of the person as input. The sparse geometry of the face is computed from this image and used to warp the prototype surface to obtain the complete 3D surface of the personís face. This surface is etched into a glass cube using sub-surface laser engraving (SSLE) technology. Synthesized facial animation videos are then projected onto the etched glass cube. Even though the etched surface is static, the projection of facial animation onto it results in a compelling experience for the viewer. We show several examples of 2D and 3D avatars that are driven by text and speech inputs.

Publications

"Creating a Speech Enabled Avatar from a Single Photograph,"
D. Bitouk and S. K. Nayar,
Proceedings of IEEE Virtual Reality,
Mar, 2008.
[PDF] [bib] [©]

Interactive Text to Facial Animation Avatar Demonstration

  Speech Enabled Avatars - An Interactive Demo:

An interactive demo by Kevin Chiu and Dmitri Bitouk.

Requirements: Java 1.5 and display card drivers that support OpenGL 2.0 or higher are required to view this demo. Depending on your connection speed, the demo may take up to a minute to load.

     

Videos

If you are having trouble viewing these .wmv videos in your browser, please save them to your computer first (by right-clicking and choosing "Save Target As..."), and then open them.

  Virtual Reality 2008 Video:

This video contains a brief summary of our approach and examples of 2D and 3D avatars created from a single image of a person. (With narration)

     
  2D Avatars:

This video explains how a 2D avatar can be created from a single photograph and shows two examples of speech enabled avatars driven by text input. (With narration)

     
  3D Avatars:

This video shows how our approach can be used to build volumetric displays featuring speech-enabled 3D avatars. (With narration)

     

Slides

Virtual Reality 2008 presentation     With videos (zip file)

Face Swapping: Automatically Replacing Faces in Photographs

Volumetric Displays: Passive Optical Scatterers

Lighting Sensitive Display