EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters

Xuli Shen^1,2, Hua Cai², Dingding Yu², Weilin Shen², Qing Xu², Xiangyang Xue^1*

¹Fudan University, ²UniDT
ICME 2025

Abstract

Generating emotion-specific talking head videos from audio input is an important and complex challenge for human-machine interaction. However, emotion is highly abstract concept with ambiguous boundaries, and it necessitates disentangled expression parameters to generate emotionally expressive talking head videos. In this work, we present EmoHead to synthesize talking head videos via semantic expression parameters. To predict expression parameter for arbitrary audio input, we apply an audio-expression module that can be specified by an emotion tag. This module aims to enhance correlation from audio input across various emotions. Furthermore, we leverage pre-trained hyperplane to refine facial movements by probing along the vertical direction. Finally, the refined expression parameters regularize neural radiance fields and facilitate the emotion-consistent generation of talking head videos. Experimental results demonstrate that semantic expression parameters lead to better reconstruction quality and controllability.

BibTeX

@article{shen2025emohead, title={EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters}, author={Xuli Shen and Hua Cai and Dingding Yu and Weilin Shen and Qing Xu and Xiangyang Xue}, year={2025}, volume={arXiv:2503.19416}, journal={arXiv preprint}, url={https://arxiv.org/abs/2503.19416}, }

EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters

Abstract

Figure I. Reconstruction result of varying different emotions (upscaled by GFPGAN-v1.3).

Figure II. Example of expression refinement via hyperplane on MEAD dataset.

Figure III. Example of expression refinement via hyperplane on CREMA-D dataset.

Figure IV. Framework of Emohead. Emohead leverages a novel expression parameters refinement via trained hyperplane that classifies the target emotion. Such refinement enables continuous emotion editting.

Reconstruction result of unseen views and varying emotion.

BibTeX