ANR BRUEL Project

1 January 2023

Development of a methodology for evaluating voice identification systems The BRUEL project concerns the evaluation/certification of voice identification systems against adversarial attacks. Indeed, speaker recognition systems are vulnerable not only to speech artificially produced by voice synthesis but also to other forms of attacks such as voice identity conversion and replay attacks. The artifacts created during the creation or manipulation of these fraudulent attacks leave marks in the signal by voice synthesis algorithms, thus distinguishing the original real voice from a forged voice. Under these conditions, detecting identity theft requires evaluating identity theft countermeasures concurrently with speaker recognition systems. The BRUEL project aims to propose the first methodology for evaluating/certifying voice identification systems based on a Common Criteria approach. List of partners: CEA Eurecom Service National de Police Scientifique IRCAM LIA Laboratoire d’Informatique d’Avignon Project Coordinator: LIA Scientific Manager for LIA: Driss Matrouf Start Date: 01/01/2023 End Date: 30/06/2026 More

ANR EVA Project

1 January 2023

Explicit Voice Attributes Describing a voice in a few words remains a very arbitrary task. We can speak with a “deep”, “breathy”, “bright” or “hoarse” voice, but the full characterization of a voice would require a close set of rigorously defined attributes constituting an ontology. However, such a description grid does not exist. Machine learning applied to speech also suffers the same weakness : in most automatic processing tasks, when a speaker is modeled, abstract global representations are used without making their characteristics explicit. For instance, automatic speaker verification / identification is usually tackled thanks to the x-vectors paradigm, which consists in describing a speaker’s voice by an embedding vector only designed to distinguish speakers. Despite their very good accuracy for speaker identification, x-vectors are usually unsuitable to detect similarities between different voices with common characteristics. The same observations can be made for speech generation. We propose to carry out a comprehensive set of analyses to extract salient, unaddressed voice attributes to enrich structured representations usable for synthesis and voice conversion. Partner list: Project leader: Orange Scientific leader for LIA: Yannick Estève Start date: 01/01/2023 — End date: 31/12/2025 More

PhD defense of Mathias Quillot – 27 September 2022

27 September 2022

Date: Tuesday, September 27th at 10:00 am in S5. Title: A first step towards characterizing the information conveyed by acted voices Abstract: Before being distributed in different countries, a work such as a video game or a film needs adaptation. Subtitling and dubbing are two options for adapting a work. While subtitling is less costly to produce, dubbing better suits certain viewers who prefer to listen to the dialogue, usually in their native language, rather than reading subtitles while listening to dialogue in another language. To dub a work, the first step is to select actors from a pool of candidates whose voices will replace the original ones. This selection process is called Voice Casting and is conducted by the Artistic Director (AD), sometimes referred to as the casting director. With the emergence of new streaming platforms such as Disney+ and Amazon Prime and the tremendous growth in the video game industry, the number of works to be distributed internationally is significantly increasing. In response to this demand, more and more actors are available in the voice market. The AD may miss out on talents that are unknown to them as it is impossible to audition all candidates. Tools for Plus d'infos

H2020 SELMA Project

1 January 2021

Stream Learning for Multilingual Knowledge Transfer The internet contains vast amounts of data and information in various languages, both written and audiovisual. There’s an increasing need to leverage this largely untapped resource. The SELMA project, funded by the EU, focuses on ingesting and monitoring large quantities of data. It systematically trains machine learning models to perform tasks in natural language and utilizes these models to monitor data streams, aiming to enhance multilingual media monitoring and real-time content production. Ultimately, the project will advance cutting-edge techniques in language modeling, automatic translation, speech recognition, and synthesis. Project Coordinator: Deutsche Welle, DE Scientific Lead for LIA: Yannick ESTEVE Start Date: 01/01/2021 End Date: 30/12/2023 More

ANR muDialBot Project

1 January 2021

MUlti-party perceptually-active situated DIALog for human-roBOT interaction In muDialBot, our ambition is to proactively incorporate human-like behavioral traits in human-robot spoken communication. We aim to reach a new stage in harnessing the rich information provided by audio and visual data streams from humans. In particular, extracting verbal and non-verbal events should enhance the decision-making abilities of robots to manage turns of speech more naturally and also switch from group interactions to face-to-face dialogues according to the situation. There has been growing interest recently in companion robots capable of assisting individuals in their daily lives and effectively communicating with them. These robots are perceived as social entities, and their relevance to health and psychological well-being has been highlighted in studies. Patients, their families, and healthcare professionals will better appreciate the potential of these robots as certain limitations are quickly overcome, such as their ability to move, see, and listen to communicate naturally with humans, beyond what touchscreen displays and voice commands already enable. The scientific and technological outcomes of the project will be implemented on a commercial social robot and tested and validated with multiple use cases in the context of a day hospital unit. Large-scale data collection will complement in-situ Plus d'infos

H2020 ESPERANTO Project

1 January 2021

Exchanges for SPEech ReseArch aNd TechnOlogies Speech processing technologies are crucial for numerous commercial applications. The ESPERANTO project, funded by the EU, aims to make the next generation of AI algorithms used in speech processing applications more accessible. For instance, they should consider human involvement and be interpretable to allow sensitive applications and safeguard personal data. ESPERANTO envisions disseminating these technologies across European SMEs, expanding and ensuring their implementation for forensic, healthcare, and educational purposes. The project will support the development of freely accessible tools, conduct seminars on various speech processing themes to assist new students, researchers, and engineers working in speech AI, and contribute to the collection and sharing of linguistic and speech-related resources. Project Coordinator: University of Le Mans, FR Scientific Manager for LIA: Jean-François Bonastre Start Date: 01/01/2021 End Date: 30/06/2025 More

HDR defense of Richard Dufour – 8 December 2020

8 December 2020

Defense of the HDR entitled ‘Natural Language Processing: Studies and Contributions at the Frontiers of Interdisciplinarity’, on Tuesday, December 8, 2020, at 2:00 PM in the Thesis Room of Avignon University (Hannah Arendt Campus – City Center). The defense committee will be composed of:

ANR AISSPER Project

1 January 2020

AISSPER: Artificial Intelligence for Semantically controlled SPEech undeRstanding Artificial Intelligence (AI) holds strategic importance at the national level due to impressive outcomes achieved by deep learning algorithms in various domains such as natural language processing (NLP), medicine, and political analytics across a wide range of applications. France has emerged as a leader in deep learning owing to recent political efforts highlighted in recent years. Over the last decade, substantial efforts have been dedicated to end-to-end Spoken Language Understanding (SLU) systems, driven by the feasibility of applications like personal assistants and conversational systems. Superior results have been observed in automatic speech recognition (ASR) with architectures based on hyper-complex number algebra called quaternions, requiring less processing time (Morchid 2018) and fewer parameters to estimate compared to conventional models (Parcollet et al 2018; 2019). Reducing model parameters efficiently trains neural architectures with limited data quantities, often challenging to obtain for specific semantic concepts and contexts from specific domains. Intrinsically linked learning processes like ASR and SLU hinder the parallelization of learning examples, critical for lengthy sequences as memory constraints limit batch processing using examples. Furthermore, error analysis conducted on completed projects like M2CR, JOKER, VERA, SUMACC, Media, or DECODA highlighted the importance of Plus d'infos

PhD defense of Titouan Parcollet – 3 December 2019

3 December 2019

Thesis defense of Titouan Parcollet, entitled “Artificial Neural Networks Based on Quaternion Algebra,” will take place on Tuesday, December 3, 2019, at 2:30 PM in the Blaise Pascal amphitheater (CERI). The thesis will be presented before a jury composed of: The defense will be conducted in French. You are also invited to the reception following the defense in Room 5. Abstract: In recent years, deep learning has become the preferred approach for developing modern artificial intelligence (AI). The significant increase in computing power, along with the ever-growing amount of available data, has made deep neural networks the most efficient solution for solving complex problems. However, accurately representing the multidimensionality of real-world data remains a major challenge for artificial neural architectures. To address this challenge, neural networks based on complex and hypercomplex number algebras have been developed. Thus, the multidimensionality of data is integrated into neurons, which are now complex and hypercomplex components of the model. In particular, quaternion neural networks (QNNs) have been proposed to process three-dimensional and four-dimensional data, based on quaternions representing rotations in our three-dimensional space. Unfortunately, unlike complex-valued neural networks, which are now accepted as an alternative to real-valued neural networks, QNNs suffer from several limitations, Plus d'infos

HDR defense of Mohamed Morchid – 26 November 2019

26 November 2019

On the next 26th of November at 4 PM in the thesis room (Hannah Arendt campus). This HDR entitled ‘Neural Networks for Natural Language Processing’ will be presented before a jury composed of: Reviewers: Mrs. Dilek Z. HAKKANI-TÜR Senior Principal Scientist, Alexa AI, USA Mr. Patrice BELLOT Professor, AMU Polytech’, LIS, Marseille Mr. Frédéric ALEXANDRE Research Director INRIA, Bordeaux Examiners: Mr. Yannick ESTÈVE Professor, AU, LIA, Avignon Mr. Frédéric BÉCHET Professor, AMU, LIS, Marseille

1 2 3 4