ANR EVA Project (SLG)

Explicit Voice Attributes

Describing a voice in a few words remains a very arbitrary task. We can speak with a “deep”, “breathy”, “bright” or “hoarse” voice, but the full characterization of a voice would require a close set of rigorously defined attributes constituting an ontology. However, such a description grid does not exist. Machine learning applied to speech also suffers the same weakness : in most automatic processing tasks, when a speaker is modeled, abstract global representations are used without making their characteristics explicit. For instance, automatic speaker verification / identification is usually tackled thanks to the x-vectors paradigm, which consists in describing a speaker’s voice by an embedding vector only designed to distinguish speakers. Despite their very good accuracy for speaker identification, x-vectors are usually unsuitable to detect similarities between different voices with common characteristics. The same observations can be made for speech generation.

We propose to carry out a comprehensive set of analyses to extract salient, unaddressed voice attributes to enrich structured representations usable for synthesis and voice conversion.

Partner list:

Orange
IRCAM
LPP
LIA
IRISA

Explicit Voice Attributes

Partner list:

Project leader: Orange

Scientific leader for LIA: Yannick Estève

Start date: 01/01/2023 — End date: 31/12/2025