Gaelle Laperrière Ph.D. thesis defense – 09/09/2024

3 September 2024

Date: 9th of Septembre 2024 Time : 3PM  Place: Ada Lovelace CERI’s amphitheater, at the Jean-Henri Fabre campus of Avignon Université.   The jury will be composed of:   Alexandre Allauzen, PR at Université Paris Dauphine-PSL, LAMSADE – Rapporteur Benoit Favre, PR at Aix-Marseille Université, LIS – Rapporteur Marco Dinarelli, CR at CNRS, LIG – Examiner Nathalie Camelin, MCF at Le Mans Université, LIUM – Examiner Philippe Langlais, PR at Université de Montréal, DIRO, RALI – Examiner Fabrice Lefèvre, PR at Avignon Université, LIA – Examiner Yannick Estève, PR at Avignon Université, LIA – Thesis director   Sahar Ghannay, MCF at Université Paris-Saclay, LISN, CNRS – Thesis co-supervisor Bassam Jabaian, MCF at Avignon Université, LIA – Thesis co-supervisor Title: Spoken Language Understanding in a multilingual context This thesis falls within the scope of Deep Learning applied to Spoken Language Understanding. Its primary objective is to leverage existing data of large resourced annotated languages for speech semantics to develop effective understanding systems in low resourced languages. In recent years, significant advances were made in the field of automatic speech translation through new approaches that converge audio and textual modalities, the latter benefiting from vast amounts of data. By visualizing spoken language understanding as a translation task from a natural Plus d'infos

Séminaire SLG – Tanja Schultz – 25/04/2024

22 April 2024

On Thursday 25 April at 11am, we will host a talk from Prof. Tanja Schultz on « Neural Signal Interpretation for Spoken Communication ». The room will be defined later. Please find below a short abstract and bio from Prof. Tanja Schultz. Abstract: This talk presents advancements in decoding neural signals, providing further insights into the intricacies of spoken communication. Delving into both speech production and speech perception, we discuss low latency processing of neural signals from surface EEG, stereotactic EEG, and intracranial EEG using machine learning methods. Practical implications and human-centered applications are considered, including silent speech interfaces, neuro-speech prostheses, and the detection of auditory attention and distraction in communication. This presentation aims to spark curiosity about the evolving landscape of neural signal interpretation and its impact on the future of spoken communication. Bio: Tanja Schultz received the diploma and doctoral degrees in Informatics from University of Karlsruhe and a Master degree in Mathematics and Sport Sciences from Heidelberg University, both in  Germany. Since 2015 she is Professor for Cognitive Systems of the Faculty of Mathematics & Computer Science at the University of Bremen, Germany. Prior to Bremen she spent 7 years as Professor for Cognitive Systems at KIT (2007-2015) and over 20 years as Plus d'infos

SLG Seminar – Antoine Caubrière – 03/15/2024

11 March 2024

Next SLG meeting will take place on 03/15/2024, from 10 AM to 11 AM. We will host Antoine Caubrière from the company Orange, who will present his recent work Title: Representation of Multilingual Speech through Self-Supervised Learning in an Exclusively Sub-Saharan Context. Abstract: The Orange group operates in over a dozen sub-Saharan African countries with the ambition of offering services tailored to the needs of clients in this region. To provide localized and accessible services to digitally underserved and low-literate individuals, Orange is investing in the development of voice-based conversational agents to inform and assist its clients and employees.The implementation of such a service requires, first and foremost, a technological component for speech recognition and understanding.The strong linguistic diversity of the African continent, coupled with the challenges of limited annotated data, poses one of the challenges in implementing speech processing technology for these languages. One potential solution could be the utilization of self-supervised learning techniques. Leveraging this type of learning enables the training of a speech representation extractor capable of capturing rich features. This approach utilizes a large quantity of unlabeled data for pre-training a model before fine-tuning it for specific tasks. While numerous self-supervised models are shared within the Plus d'infos

SLG Seminar- 15/02/2024

13 February 2024

Thibault Roux will organize a debate on the subject mentioned below: “Recent advances in technology have raised many questions and concerns about their impact on our societies. Many people are concerned about military use, mass surveillance or disinformation. From a more global perspective, Nick Bostrom, a philosopher, theorizes the vulnerable world hypothesis which predicts that science will destroy humanity.In this debate, we will question our own biases as researchers and try to answer the ethical questions raised by this hypothesis. Is science a threat to humanity? Should we stop science? Or more seriously, can we find a solution to prevent ourselves from self-destruction ?”

SLG Seminar – Ryan Whetten – 01/02/2024

25 January 2024

The next SLG meeting will take place in room S5 on Thursday, February 1st, from 12:00 PM to 1:00 PM. Ryan Whetten will present his work, and you can find a brief introduction below. ——————————————————————— Open Implementation and Study of BEST-RQ for Speech Processing Abstract: Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. Recently, Google came out with a model called BEST-RQ (BERT-based Speech pre-Training with Random-projection Quantizer). Despite BEST-RQ’s great performance and simplicity, details are lacking in the original paper and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR. In this presentation, we will discuss the details of my implementation of BEST-RQ and then see results from our preliminary study on four downstream tasks. Results show that a random projection quantizer can achieve similar downstream performance as wav2vec 2.0 while decreasing training time by over a factor of two.

SLG Seminar – Paul Gauthier Noé – 18/01/2024

10 January 2024

On 18 January from 12 am, we will host a talk from Dr. Paul Gauthier Noé on « Explaining probabilistic predictions … ». The presentation will be hosted on room S6.    More details will follow   Bio: Paul Gauthier Noe just received a PhD in Computer Science in Avignon Université under the supervision of Prof. Jean-François Bonastre and Dr. Driss Matrouf. He was working for the international JST-ANR VoicePersonae project and his main research interests are Speaker verification, Bayesian decision theory, Calibration of probabilities and Privacy in Speech.

SLG Seminar – Fenna Poletiek – 12/01/2024

8 January 2024

On 12 January from 12 am, we will host a virtual talk from Dr. Fenna Poletiek from Institute of Psychology at Leiden University on « Language learning in the lab ».   The presentation will be hosted on room S6.   Abstract: Language learning in the lab Language learning skills have been considered a defining feature of humanness. In this view language cannot be acquired by mere associative or statistical learning processes, only, like many other skills are learned by human and nonhuman primates during development. Indeed, the high (recursive) complexity of human grammars have been shown to make them impossible to learn by exposure to language exemplars only. Some research suggests, however, that at least some statistical learning is recruited in language acquisition (Perruchet & Pacton, 2006). And primates have been shown to mimic complex grammatical patterns after being trained on a sequence of stimulus responses (Rey et al., 2012). We performed series of studies with artificial languages in the lab, to investigate associative and statistical learning processes that support language learning. The results thus far suggest a fine tuned cooperation between three crucial features of the natural language learning process: first, learning proceeds ‘starting small’ with short simple sentences growing in complexity Plus d'infos

DAPADAF-E Project

13 December 2023

Validity of a task of acoustic-phonetic decoding on anatomic deficits in paramedical assessment of speech disorders for patients treated for oral or oropharyngeal cancer Plus d'infos

SLG Meeting – St Germes Bengono Obiang – 21/12/2023

12 December 2023

The next SLG meeting will be held in room S1 on Thursday, December 21st, from 12:00 PM to 1:00 PM.    We will have the pleasure of hosting St Germes BENGONO OBIANG, a PhD student in speech processing, focusing on tone recognition in under-resourced languages. He is supervised by Norbert TSOPZE and Paulin MELATAGIA from the University of Yaoundé 1, as well as by Jean-François BONASTRE and Tania JIMENEZ from LIA.   Abstract: Many sub-Saharan African languages are categorized as tone languages and for the most part, they are classified as low resource languages due to the limited resources and tools available to process these languages. Identifying the tone associated with a syllable is therefore a key challenge for speech recognition in these languages. We propose models that automate the recognition of tones in continuous speech that can easily be incorporated into a speech recognition pipeline for these languages. We have investigated different neural architectures as well as several features extraction algorithms in speech (Filter banks, Leaf, Cestrogram, MFCC). In the context of low-resource languages, we also evaluated Wav2vec models for this task. In this work, we use a public speech recognition dataset on Yoruba. As for the results, using the Plus d'infos

PhD defense of Anais Chanclu – 11 December 2023

11 December 2023

Thesis defense of Anais Chanclu Date: Monday 11 December 2023 at 14:30  Location: Thesis room, Hannah Arendt campus. Title: Recognizing individuals by their voice: defining a scientific framework to ensure the reliability of voice comparison results in forensic contexts Jury: Abstract: In police investigations or criminal trials, voice recordings are often collected for comparison purposes with the voice of suspects. Typically, these recordings, referred to as ‘traces’, come from phone taps, emergency service calls, or voicemail messages. Recordings of suspects, known as ‘comparison pieces’, are usually obtained by law enforcement through voice sampling. Since the traces and comparison pieces were not recorded under the same conditions, and the recording conditions of the traces are often poorly known or entirely unknown, the variability between the recordings being compared cannot be quantified. Numerous factors come into play, including audio file characteristics, linguistic content, the recording environment, and the speaker(s). Voice comparison practices have evolved throughout history without conforming to a scientific framework. This has led to questioning the reliability of voice expertise (as in the Trayvon Martin case) and the use of fallacious practices (as in the Élodie Kulik case), potentially leading to judicial errors. Nowadays, the French Scientific Police (SNPS) and the Plus d'infos

1 2 3