SLG Seminar – Ryan Whetten – 01/02/2024

25 January 2024

The next SLG meeting will take place in room S5 on Thursday, February 1st, from 12:00 PM to 1:00 PM. Ryan Whetten will present his work, and you can find a brief introduction below. ——————————————————————— Open Implementation and Study of BEST-RQ for Speech Processing Abstract: Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. Recently, Google came out with a model called BEST-RQ (BERT-based Speech pre-Training with Random-projection Quantizer). Despite BEST-RQ’s great performance and simplicity, details are lacking in the original paper and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR. In this presentation, we will discuss the details of my implementation of BEST-RQ and then see results from our preliminary study on four downstream tasks. Results show that a random projection quantizer can achieve similar downstream performance as wav2vec 2.0 while decreasing training time by over a factor of two.

SLG Seminar – Paul Gauthier Noé – 18/01/2024

10 January 2024

On 18 January from 12 am, we will host a talk from Dr. Paul Gauthier Noé on « Explaining probabilistic predictions … ». The presentation will be hosted on room S6.    More details will follow   Bio: Paul Gauthier Noe just received a PhD in Computer Science in Avignon Université under the supervision of Prof. Jean-François Bonastre and Dr. Driss Matrouf. He was working for the international JST-ANR VoicePersonae project and his main research interests are Speaker verification, Bayesian decision theory, Calibration of probabilities and Privacy in Speech.

SLG Seminar – Fenna Poletiek – 12/01/2024

8 January 2024

On 12 January from 12 am, we will host a virtual talk from Dr. Fenna Poletiek from Institute of Psychology at Leiden University on « Language learning in the lab ».   The presentation will be hosted on room S6.   Abstract: Language learning in the lab Language learning skills have been considered a defining feature of humanness. In this view language cannot be acquired by mere associative or statistical learning processes, only, like many other skills are learned by human and nonhuman primates during development. Indeed, the high (recursive) complexity of human grammars have been shown to make them impossible to learn by exposure to language exemplars only. Some research suggests, however, that at least some statistical learning is recruited in language acquisition (Perruchet & Pacton, 2006). And primates have been shown to mimic complex grammatical patterns after being trained on a sequence of stimulus responses (Rey et al., 2012). We performed series of studies with artificial languages in the lab, to investigate associative and statistical learning processes that support language learning. The results thus far suggest a fine tuned cooperation between three crucial features of the natural language learning process: first, learning proceeds ‘starting small’ with short simple sentences growing in complexity Plus d'infos

DAPADAF-E Project

13 December 2023

Validity of a task of acoustic-phonetic decoding on anatomic deficits in paramedical assessment of speech disorders for patients treated for oral or oropharyngeal cancer Plus d'infos

SLG Meeting – St Germes Bengono Obiang – 21/12/2023

12 December 2023

The next SLG meeting will be held in room S1 on Thursday, December 21st, from 12:00 PM to 1:00 PM.    We will have the pleasure of hosting St Germes BENGONO OBIANG, a PhD student in speech processing, focusing on tone recognition in under-resourced languages. He is supervised by Norbert TSOPZE and Paulin MELATAGIA from the University of Yaoundé 1, as well as by Jean-François BONASTRE and Tania JIMENEZ from LIA.   Abstract: Many sub-Saharan African languages are categorized as tone languages and for the most part, they are classified as low resource languages due to the limited resources and tools available to process these languages. Identifying the tone associated with a syllable is therefore a key challenge for speech recognition in these languages. We propose models that automate the recognition of tones in continuous speech that can easily be incorporated into a speech recognition pipeline for these languages. We have investigated different neural architectures as well as several features extraction algorithms in speech (Filter banks, Leaf, Cestrogram, MFCC). In the context of low-resource languages, we also evaluated Wav2vec models for this task. In this work, we use a public speech recognition dataset on Yoruba. As for the results, using the Plus d'infos

PhD defense of Anais Chanclu – 11 December 2023

11 December 2023

Thesis defense of Anais Chanclu Date: Monday 11 December 2023 at 14:30  Location: Thesis room, Hannah Arendt campus. Title: Recognizing individuals by their voice: defining a scientific framework to ensure the reliability of voice comparison results in forensic contexts Jury: Abstract: In police investigations or criminal trials, voice recordings are often collected for comparison purposes with the voice of suspects. Typically, these recordings, referred to as ‘traces’, come from phone taps, emergency service calls, or voicemail messages. Recordings of suspects, known as ‘comparison pieces’, are usually obtained by law enforcement through voice sampling. Since the traces and comparison pieces were not recorded under the same conditions, and the recording conditions of the traces are often poorly known or entirely unknown, the variability between the recordings being compared cannot be quantified. Numerous factors come into play, including audio file characteristics, linguistic content, the recording environment, and the speaker(s). Voice comparison practices have evolved throughout history without conforming to a scientific framework. This has led to questioning the reliability of voice expertise (as in the Trayvon Martin case) and the use of fallacious practices (as in the Élodie Kulik case), potentially leading to judicial errors. Nowadays, the French Scientific Police (SNPS) and the Plus d'infos

PhD defense of Thibault Cordier – 13 October 2023

13 October 2023

Date: Friday, the 13th of October at 9 am, Place: room “salle des thèses” at l’Université d’Avignon, Campus Hannah Arendt (centre-ville). Title: « Hierarchical Imitation and Reinforcement Learning for Multi-Domain Task-Oriented Dialogue Systems ». The defense can be followed through the live link below: https://v-au.univ-avignon.fr/live Abstract: In this Ph.D thesis, we study task-oriented dialogue systems that are systems designed to assist users in completing specific tasks, such as booking a flight or ordering food. They typically rely on reinforcement learning paradigm to model the dialogue that allows the system to reason about the user’s goals and preferences, and to select actions that will lead to the desired outcome. Our focus is specifically on learning from a limited number of interactions that is crucial due to the scarcity and costliness of human interactions. Standard reinforcement learning algorithms typically require a large amount of interaction data to achieve good performance. To address this challenge, we aim to make dialogue systems more sample-efficient in their training. We draw from two main ideas: imitation and hierarchy. Our first contribution explores the integration of imitation with reinforcement learning. We investigate how to effectively use expert demonstrations to extrapolate knowledge with minimal generalisation effort. Our second contribution focuses on Plus d'infos

PhD defense of Sondes Abderrazek – 2 May 2023

2 May 2023

Date: 2nd of May at 2:00 pm. Place: Avignon at the Centre d’Enseignement et de Recherche en Informatique (Ada Lovelace Auditorium)  The jury members: Title: Assessment of Speech Intelligibility using Deep Learning: Towards Enhanced Interpretability in Clinical Phonetics. Abstract: Speech intelligibility is an essential component of effective communication. It refers to the degree to which a speaker’s intended message can be understood by a listener. This capacity can be hampered as a consequence of speech disorders, which results in a reduced quality of life for individuals. In the case of Head and Neck Cancer (HNC), speech may be affected due to the presence of tumors in the speech production system, but the main cause of speech impairment is typically the tumor treatment including surgery, radiotherapy, chemotherapy, or a combination of these treatments. In such cases, the evaluation of speech quality is crucial to assess the communication deficit of patients and develop targeted treatment plans. In clinical practice, perceptual measures are considered the gold standard for assessing speech disorders. Although these measures are widely used, they suffer from several limitations, the most important of which is their subjectivity. Consequently, the automatic assessment of speech disorders has emerged as a promising alternative to perceptual Plus d'infos

ANR TRADEF Project

1 January 2023

Tracking and Detecting Fake News and Deepfakes in Arabic Social Networks The 4th Generation War (4GW) is known as an information war involving non-military populations. It is conducted by national or transnational groups following ideologies based on cultural beliefs, religion, economic or political interests, aiming to sow chaos in a targeted region globally. In 1989, authors discussing the 4th generation war, some of whom were military experts, explained that it would be widespread and challenging to define in the decades to come. With the emergence of social networks, the previously vague battlefield found a platform for 4GW. One of its penetration points is the extensive use of social networks to manipulate opinions, aiming to shape the targeted region’s perspective to accept a certain state of affairs and render it socially and politically acceptable. Much like the 4th generation war, cognitive warfare aims to blur comprehension mechanisms regarding politics, economy, religion, etc. Its consequence is destabilizing and weakening the adversary. This cognitive war targets what is assumed to be the enemy’s brain, altering reality by flooding the adversary’s population with misleading information, rumors, or manipulated videos. Furthermore, the proliferation of social bots today enables automated dissemination of disinformation on social networks. Plus d'infos

ANR ESSL Project

1 January 2023

Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies Self-Supervised Learning (SSL) has recently emerged as an incredibly promising artificial intelligence (AI) method. Through this method, massive amounts of unlabeled data that are accessible can be utilized by AI systems to surpass known performances. Particularly, the field of Automatic Speech Processing (ASP) is swiftly being transformed by the arrival of SSL, thanks in part to massive industrial investments and the explosion of data, both provided by a handful of companies. The performance gains are impressive, but the complexity of SSL models requires researchers and industry professionals in the field to have extraordinary computational capacity, drastically limiting access to fundamental research in this area and its deployment in everyday products. For instance, a significant portion of work using an SSL model for ASP relies on a system maintained and provided by a single company (wav2vec 2.0). The entire lifecycle of the technology, from its theoretical foundations to its practical deployment and societal analysis, therefore depends solely on institutions with the physical and financial means to support the intensity of this technique’s development. The E-SSL project aims to restore to the scientific community and ASP industry the necessary control over self-supervised learning Plus d'infos

1 2 3 4