PhD defense of Gaelle Laperrière – 09/09/2024 – Laboratoire Informatique d’Avignon

Date: 9th of Septembre 2024

Time : 3PM

Place: Ada Lovelace CERI’s amphitheater, at the Jean-Henri Fabre campus of Avignon Université.

The jury will be composed of:

Alexandre Allauzen, PR at Université Paris Dauphine-PSL, LAMSADE – Rapporteur

Benoit Favre, PR at Aix-Marseille Université, LIS – Rapporteur

Marco Dinarelli, CR at CNRS, LIG – Examiner

Nathalie Camelin, MCF at Le Mans Université, LIUM – Examiner

Philippe Langlais, PR at Université de Montréal, DIRO, RALI – Examiner

Fabrice Lefèvre, PR at Avignon Université, LIA – Examiner

Yannick Estève, PR at Avignon Université, LIA – Thesis director

Sahar Ghannay, MCF at Université Paris-Saclay, LISN, CNRS – Thesis co-supervisor

Bassam Jabaian, MCF at Avignon Université, LIA – Thesis co-supervisor

Title: Spoken Language Understanding in a multilingual context

This thesis falls within the scope of Deep Learning applied to Spoken Language Understanding. Its primary objective is to leverage existing data of large resourced annotated languages for speech semantics to develop effective understanding systems in low resourced languages.

In recent years, significant advances were made in the field of automatic speech translation through new approaches that converge audio and textual modalities, the latter benefiting from vast amounts of data. By visualizing spoken language understanding as a translation task from a natural source language to a conceptual target language, we consider the SAMU-XLSR speech encoder, which generates a semantically enriched language-agnostic encoding. We demonstrate the positive impact of this type of encoder in an end-to-end speech understanding neural network and closely examine its linguistic and semantic encoding capabilities. This study continues with the specialization of its enrichment, aiming to direct its encoding towards the semantic domain of the French MEDIA, Italian PortMEDIA, and Tunisian TARIC-SLU tasks. A dual specialization is proposed to preserve the encoder’s ability to generate certain semantic abstractions while limiting the loss of its cross-lingual abilities during the usual fine-tuning phase of the model on the final task. Our contributions participated in improving state-of-the-art in portability between languages and domains for MEDIA, PortMEDIA, and TARIC-SLU datasets.

The SpeechBrain project has played a crucial role in implementing our experiments. We contributed to this open-source project by integrating an exhaustive recipe for the MEDIA benchmark into its official distribution.