Alexandre Allauzen, PR at Université Paris Dauphine-PSL, LAMSADE – Rapporteur
Benoit Favre, PR at Aix-Marseille Université, LIS – Rapporteur
Marco Dinarelli, CR at CNRS, LIG – Examiner
Nathalie Camelin, MCF at Le Mans Université, LIUM – Examiner
Philippe Langlais, PR at Université de Montréal, DIRO, RALI – Examiner
Fabrice Lefèvre, PR at Avignon Université, LIA – Examiner
Yannick Estève, PR at Avignon Université, LIA – Thesis director
Sahar Ghannay, MCF at Université Paris-Saclay, LISN, CNRS – Thesis co-supervisor
Bassam Jabaian, MCF at Avignon Université, LIA – Thesis co-supervisor
Title: Spoken Language Understanding in a multilingual context
This thesis falls within the scope of Deep Learning applied to Spoken Language Understanding. Its primary objective is to leverage existing data of large resourced annotated languages for speech semantics to develop effective understanding systems in low resourced languages.
In recent years, significant advances were made in the field of automatic speech translation through new approaches that converge audio and textual modalities, the latter benefiting from vast amounts of data. By visualizing spoken language understanding as a translation task from a natural source language to a conceptual target language, we consider the SAMU-XLSR speech encoder, which generates a semantically enriched language-agnostic encoding. We demonstrate the positive impact of this type of encoder in an end-to-end speech understanding neural network and closely examine its linguistic and semantic encoding capabilities. This study continues with the specialization of its enrichment, aiming to direct its encoding towards the semantic domain of the French MEDIA, Italian PortMEDIA, and Tunisian TARIC-SLU tasks. A dual specialization is proposed to preserve the encoder’s ability to generate certain semantic abstractions while limiting the loss of its cross-lingual abilities during the usual fine-tuning phase of the model on the final task. Our contributions participated in improving state-of-the-art in portability between languages and domains for MEDIA, PortMEDIA, and TARIC-SLU datasets.
The SpeechBrain project has played a crucial role in implementing our experiments. We contributed to this open-source project by integrating an exhaustive recipe for the MEDIA benchmark into its official distribution.