ANR EVA Project

Explicit Voice Attributes

Describing a voice in a few words remains a very arbitrary task. We can speak with a “deep”, “breathy”, “bright” or “hoarse” voice, but the full characterization of a voice would require a close set of rigorously defined attributes constituting an ontology. However, such a description grid does not exist. Machine learning applied to speech also suffers the same weakness : in most automatic processing tasks, when a speaker is modeled, abstract global representations are used without making their characteristics explicit. For instance, automatic speaker verification / identification is usually tackled thanks to the x-vectors paradigm, which consists in describing a speaker’s voice by an embedding vector only designed to distinguish speakers. Despite their very good accuracy for speaker identification, x-vectors are usually unsuitable to detect similarities between different voices with common characteristics. The same observations can be made for speech generation.

We propose to carry out a comprehensive set of analyses to extract salient, unaddressed voice attributes to enrich structured representations usable for synthesis and voice conversion.

Partner list:

  • Orange
  • IRCAM
  • LPP
  • LIA
  • IRISA

Project leader: Orange

Scientific leader for LIA: Yannick Estève

Start date: 01/01/2023 — End date: 31/12/2025

More