Face Synthesis

Creating a DIVA requires controllable models of a face and/or vocal tract. Examples of muscle-based, parametric, and kinematic control models of the face models include work by Lee et al. (1995), Massaro and Cohen (1990), and Kuratate et al. (2005). Our approach uses the ArtiSynth toolkit (Fels et al., 2006) that includes mechanisms to implement different model of tissue, bones and muscles to create face models. ArtiSynth already includes sophisticated jaw, tongue and hyoid models that we use. In Phase 1, we combine Kuratate et al.’s principal component analysis (PCA) based face models with the articulatory speech synthesis engines of ArtiSynth. This approach is similar to the techniques in FaceGen™, Blantz and Vetter (1999) and Dipaola (2002). Phases 2 and 3 will extend this.

In addition to its relative simplicity, the PCA-based face model still remains the only talking face model that has been validated both kinematically for measured motion input (Kuratate et al. 2005) and perceptually for linguistically-relevant output (Munhall et al. 2004). Validation not only determines the communicative efficacy and accuracy of the system, but serves as a critical guide to further developments of multimedia communication systems such as those proposed here. Thus, analyses of production data and perceptual evaluations of natural and synthesized behavior are conducted during all phases of the project. Using the same type of comparison between audio, visual, and audiovisual presentations, the measured correspondences between hand gestures and synthesized voice and faces are evaluated perceptually. These results jumpstart the investigation of the coordination and communicative interaction that occur when a performer simultaneously generates vocal and hand-gestured audiovisual behavior and the potentially more complex conditions governing ensemble performances of multiple performers that occur from the second year onward.