Audio-Driven Speech Animation with Text-Guided Expression

dc.contributor.authorJung, Sunjinen_US
dc.contributor.authorChun, Sewhanen_US
dc.contributor.authorNoh, Junyongen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:04:06Z
dc.date.available2024-10-13T18:04:06Z
dc.date.issued2024
dc.description.abstractWe introduce a novel method for generating expressive speech animations of a 3D face, driven by both audio and text descriptions. Many previous approaches focused on generating facial expressions using pre-defined emotion categories. In contrast, our method is capable of generating facial expressions from text descriptions unseen during training, without limitations to specific emotion classes. Our system employs a two-stage approach. In the first stage, an auto-encoder is trained to disentangle content and expression features from facial animations. In the second stage, two transformer-based networks predict the content and expression features from audio and text inputs, respectively. These features are then passed to the decoder of the pre-trained auto-encoder, yielding the final expressive speech animation. By accommodating diverse forms of natural language, such as emotion words or detailed facial expression descriptions, our method offers an intuitive and versatile way to generate expressive speech animations. Extensive quantitative and qualitative evaluations, including a user study, demonstrate that our method can produce natural expressive speech animations that correspond to the input audio and text descriptions.en_US
dc.description.sectionheadersHuman and Character Animation
dc.description.seriesinformationPacific Graphics Conference Papers and Posters
dc.identifier.doi10.2312/pg.20241290
dc.identifier.isbn978-3-03868-250-9
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20241290
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20241290
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Animation; Neural networks
dc.subjectComputing methodologies → Animation
dc.subjectNeural networks
dc.titleAudio-Driven Speech Animation with Text-Guided Expressionen_US
Files
Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
pg20241290.pdf
Size:
3.83 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
supplementary_material.pdf
Size:
111.63 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
video.mp4
Size:
80.2 MB
Format:
Video MP4