PhD defense of Anais Chanclu – 11 December 2023

Thesis defense of Anais Chanclu

Date: Monday 11 December 2023 at 14:30 

Location: Thesis room, Hannah Arendt campus.

Title: Recognizing individuals by their voice: defining a scientific framework to ensure the reliability of voice comparison results in forensic contexts

Jury:

  • Jean-François Bonastre, Professeur, Avignon Université, Laboratoire Informatique d’Avignon (Directeur de thèse)
  • Martine Adda-Decker, Directrice de Recherche, Université Paris 3 Sorbonne Nouvelle et Laboratoire de Phonétique et Phonologie (Rapporteuse)
  • Julien Pinquier, Maître de Conférence, Université Toulouse III – Paul Sabatier, Institut de Recherche en Informatique de Toulouse (Rapporteur)
  • Christine Meunier, Directrice de Recherche, Laboratoire Parole et Langage, Aix-Marseille Université (Examinatrice)

Abstract: In police investigations or criminal trials, voice recordings are often collected for comparison purposes with the voice of suspects. Typically, these recordings, referred to as ‘traces’, come from phone taps, emergency service calls, or voicemail messages. Recordings of suspects, known as ‘comparison pieces’, are usually obtained by law enforcement through voice sampling. Since the traces and comparison pieces were not recorded under the same conditions, and the recording conditions of the traces are often poorly known or entirely unknown, the variability between the recordings being compared cannot be quantified. Numerous factors come into play, including audio file characteristics, linguistic content, the recording environment, and the speaker(s).

Voice comparison practices have evolved throughout history without conforming to a scientific framework. This has led to questioning the reliability of voice expertise (as in the Trayvon Martin case) and the use of fallacious practices (as in the Élodie Kulik case), potentially leading to judicial errors. Nowadays, the French Scientific Police (SNPS) and the Institute of Criminal Research of the National Gendarmerie (IRCGN) have established quality protocols to ensure their expertise is based on scientific literature. The goal of this thesis is to establish a scientific framework to assess the reliability of voice comparison results. To achieve this, we focus on three aspects: the influence of certain factors on voice comparison performance, human perception of a speaker’s identity, and voice characterization.

Firstly, we address the influence of certain factors on voice comparison performance. We study these factors individually and then in combination with other factors. The results show that some factors have a greater impact on performance than others. However, variability exists among speakers. Indeed, the studied factors do not affect performance in the same way for all speakers.

Secondly, we study human perception of speakers. For this, we conducted a perceptual experiment involving grouping recordings based on speakers. To accomplish this task, we defined a grouping purity measure. We also compared the obtained results with those from an automatic voice comparison. The results showed disparities in the speaker grouping, notably linked to the listeners’ native language. The automatic approach achieved better results than the human listeners.

Lastly, we delve into voice characterization. We developed a new system to detect phonation types, initially on pre-pausal vowels, and subsequently on all voiced phonemes. This new system uses PASE+ for extracting multiple parameters and a Multilayer Perceptron (MLP) for classification. We compared this system with a more traditional system based on Mel-Frequency Cepstral Coefficients (MFCC) extraction and Support Vector Machine (SVM) classification. The results highlight the superiority of the newly created system over the traditional one. Generalizing to all voiced phonemes showed that female speakers tended to have modal voice, while male speakers tended to have non-modal voice.

In conclusion, this thesis has demonstrated that voice comparison is a complex field where results can be influenced by numerous factors. Standardizing voice comparison practices requires an in-depth understanding of these factors and their interplay. However, this thesis only explored a handful of factors, demanding further research to standardize voice comparison practices and ensure reliable results.