PhD defense of Mathias Quillot – 27 September 2022 – Laboratoire Informatique d’Avignon

Date: Tuesday, September 27th at 10:00 am in S5.

Title: A first step towards characterizing the information conveyed by acted voices

Abstract: Before being distributed in different countries, a work such as a video game or a film needs adaptation. Subtitling and dubbing are two options for adapting a work. While subtitling is less costly to produce, dubbing better suits certain viewers who prefer to listen to the dialogue, usually in their native language, rather than reading subtitles while listening to dialogue in another language. To dub a work, the first step is to select actors from a pool of candidates whose voices will replace the original ones. This selection process is called Voice Casting and is conducted by the Artistic Director (AD), sometimes referred to as the casting director.

With the emergence of new streaming platforms such as Disney+ and Amazon Prime and the tremendous growth in the video game industry, the number of works to be distributed internationally is significantly increasing. In response to this demand, more and more actors are available in the voice market. The AD may miss out on talents that are unknown to them as it is impossible to audition all candidates. Tools for recommending and searching for actors based on automatic speech processing would assist ADs in finding new talents that would enrich the vocal diversity of works for better audience immersion.

Exploring actor recommendation involves studying the concept of “acted voice.” In multimedia works, the acted voice is expressed by professional actors; its purpose is to evoke the desired effect in the audience by giving a particular behavior to the character. Its study involves a dual complexity in terms of production and perception, which explains why the acted voice is scarcely present in speech processing literature.

Previous work has addressed the issue of voice casting by focusing on video game character voices. In these studies, voice similarity is central. Systems leverage the associations between original actors and dubbing actors to model part of the decision-making process of the operator (the AD). The task is to predict if the two voices provided to the systems are portraying the same character in the form of a character similarity measure.

In this manuscript, we delve into the character information: the set of acoustic signs in a vocal recording that characterize the portrayed character. Although previous work has shown the existence of such information in the acted voice, the nature of this information remains largely unknown. In this manuscript, we aim to shed light on certain unknowns by studying two questions:

What connection does the character information have with its actor?
What are the vocal markers that shape the character?

Firstly, we establish a protocol to evaluate the presence of character information called “independent of the speaker.” In our experiments, we demonstrate that this information exists but is minimally expressed in our data.

Secondly, we show in an experiment that speaker information is useful in constructing systems dedicated to characterizing the portrayed character.

Finally, we propose an experiment that involves extracting vocal markers dedicated to character portrayal from character labels and recordings.