AI & ML New Capability

Enables reliable, training-free emotion steering in speech-generative audio models via direct manipulation of specific emotion-sensitive neurons.

March 19, 2026

Original Paper

Neuron-Level Emotion Control in Speech-Generative Large Audio-Language Models

Xiutian Zhao, Ismail Rasim Ulgen, Philipp Koehn, Björn Schuller, Berrak Sisman

arXiv · 2603.17231

The Takeaway

By identifying and intervening on ESNs at inference time, it achieves controllable emotional speech without the linguistic degradation (hallucinations/refusals) common in prompting-based methods. This establishes a mechanistic path for fine-grained control of multimodal LLMs.

From the abstract

Large audio-language models (LALMs) can produce expressive speech, yet reliable emotion control remains elusive: conversions often miss the target affect and may degrade linguistic fidelity through refusals, hallucinations, or paraphrase. We present, to our knowledge, the first neuron-level study of emotion control in speech-generative LALMs and demonstrate that compact emotion-sensitive neurons (ESNs) are causally actionable, enabling training-free emotion steering at inference time. ESNs are i

Read the original paper →

← Back to today's papers