Audio-Visual Speech Processing (Hardcover)
In recent years, researchers have begun to question the unimodal paradigm of speech processing and to explore the multimodal model. When we speak, both the visible motions of the face and the audible speech acoustics are shaped by the behavior of the vocal tract. Much work in the field now examines both auditory and visual aspects of speech processing, and speechreading is considered a psychological process of interest beyond its direct application in hearing loss and deafness. This book assembles a broad collection of the latest work on audio-visual (AV) speech processing by human and machines. The book first treats the two main questions about human audio-visual performance: how both auditory and visual signals combine to access the mental lexicon, and where in the brain this process takes place. The contributions show that AV perception is able to recover properties that are carried by neither modality alone. The book then turns to the production and perception of multimodal speech, and the coordination of structures within and across the two modalities. Finally, the book presents some of the latest developments of speech processing by computers, particularly in AV speech recognition and synthesis. Work in computer-generated facial animation now goes beyond the traditional application areas of animation and games to address the challenge of applying the metaphor of face-to-face conversation to human-computer interfaces.