ANR Project VoicePersonae

Speaker identity cloning and protection

With recent advancements in automatic speech and language processing, humans are increasingly interacting vocally with intelligent artificial agents. The use of voice in applications is expanding rapidly, and this mode of interaction is becoming more widely accepted. Nowadays, vocal systems can offer synthesized messages of such quality that discerning them from human-recorded messages is difficult. They are also capable of understanding requests expressed in natural language, albeit within their specific application framework. Furthermore, these systems frequently recognize or identify their users by their voices.

This project focuses on the concept of vocal identity, primarily concerning voice generation and speaker recognition. Voice generation encompasses all modules of vocal interfaces that produce speech excerpts sounding like a given natural voice. These modules include voice synthesis and voice conversion technologies, both capable of producing voice samples corresponding to the vocal identity of a targeted individual. Speaker recognition involves vocal biometrics, determining or verifying a person’s identity by their voice. Voice generation and speaker recognition are two technologies that may conflict, either between themselves or with other aspects of vocal interfaces. Voice generation aims to artificially produce speech sounding “natural” and produced by a specific individual, whereas speaker recognition seeks to verify the authenticity of a vocal message and the identity of the person who produced it. A speaker recognition system can be used to train a voice generation system, resulting in the synthetic voice produced by the final system deceiving said vocal biometric system. Another conflict arises when speaker recognition is used without the speaker’s knowledge. To protect against this, a “vocal anonymization” approach, involving both speaker recognition and voice generation aspects, must be developed to remove a speaker’s identity from a vocal message while preserving, at least, its linguistic content, as well as its natural aspects, emotional tone, and ‘color.’ These three aspects—generation, identity recognition, and anonymization—are closely linked and must be considered together.

VoicePersonae aims to bridge the technological gap between the different aspects of the “vocal identity” concept presented earlier. This project proposes to (a) model “vocal identity,” (b) enhance the security and robustness of vocal biometric systems, and (c) protect users’ privacy. VoicePersonae will unify disparate approaches to multi-speaker voice generation, combining voice synthesis and transformation. To achieve this, VoicePersonae will leverage the latest speaker recognition technologies. It will strengthen vocal biometric security and robustness by countering attacks using voice generation based on the refined modeling of “vocal identity” developed in this project. Finally, VoicePersonae will propose the first explicit vocal anonymization solution to safeguard personal data. To stimulate this field of “vocal identity,” specifically the vocal anonymization task, VoicePersonae will organize the first open challenge on vocal anonymization and re-identification.

Partnership List:

Project Coordinator: LIA

Scientific Lead for LIA: Jean-François BONASTRE

Start Date: 01/02/2019 End Date: 31/01/2024

More