DrBERT – French Biomedical Language Model

21 December 2023

DrBERT is a state-of-the-art language model for French Biomedical based on the RoBERTa architecture pretrained on the French Biomedical corpus NACHOS. DrBERT was assessed on 11 distinct practical biomedical applications for French language, including named entity recognition (NER), part-of-speech tagging (POS), binary/multi-class/multi-label classification, and multiple-choice question answering. The outcomes revealed that DrBERT enhanced the performance of most tasks compared to prior techniques, indicating that from-scratch pre-trained strategy is still the most effective for BERT language models on French Biomedical. DrBERT was trained and evaluated by Yanis Labrak (LIA, Zenidoc), Adrien Bazoge (LS2N), Richard Dufour (LS2N), Mickael Rouvier (LIA), Emmanuel Morin (LS2N), Béatrice Daille (LS2N) and Pierre-Antoine Gourraud (Nantes University). Website: https://drbert.univ-avignon.fr/

NACHOS – French Biomedical Corpus

21 December 2023

NACHOS is a French Biomedical corpus. It is only available for academic research. If you are intererested, contact Mickael Rouvier. Please include your name, last name, affiliation, contact details and a brief description of how you intend to use NACHOS. Website: https://drbert.univ-avignon.fr/

PhD defense of Julio Perez-Garcia – 18 December 2023

14 December 2023

Place: University of Avignon, Campus Hannah Arendt, Salle des ThèsesDate: Monday, December 18, 2023 at 14:00. Title: Contribution to security and privacy in the Blockchain-based Internet of Things: Robustness, Reliability, and Scalability. Abstract: The Internet of Things (IoT) is a diverse network of objects or ”things” typically interconnected via the Internet. Given the sensitivity of the information exchanged in IoT applications, it is essential to guarantee security and privacy. This problem is aggravated by the open nature of wireless communications, and the power and computing resource limitations of most IoT devices. At the same time, existing IoT security solutions are based on centralized architectures, which raises scalability issues and the single point of failure problem, making them susceptible to denial-of-service attacks and technical failures. Blockchain has emerged as an attractive solution to IoT security and centralization issues. Blockchains replicate a permanent, append-only record of all transactions occurring on a network across multiple devices, keeping them synchronized through a consensus protocol. Blockchain implementation may involve high computational and energy costs for devices. Consequently, solutions based on Fog/Edge computing have been considered in the integration with IoT. This approach shifts the higher computational load and higher energy consumption to the devices with higher Plus d'infos

ANR PARFAIT Project

14 December 2023

Planning And leaRning For AI-Edge compuTing Partners: Period: 2023-2027

DAPADAF-E Project

13 December 2023

Validity of a task of acoustic-phonetic decoding on anatomic deficits in paramedical assessment of speech disorders for patients treated for oral or oropharyngeal cancer Plus d'infos

SLG Meeting – St Germes Bengono Obiang – 21/12/2023

12 December 2023

The next SLG meeting will be held in room S1 on Thursday, December 21st, from 12:00 PM to 1:00 PM.    We will have the pleasure of hosting St Germes BENGONO OBIANG, a PhD student in speech processing, focusing on tone recognition in under-resourced languages. He is supervised by Norbert TSOPZE and Paulin MELATAGIA from the University of Yaoundé 1, as well as by Jean-François BONASTRE and Tania JIMENEZ from LIA.   Abstract: Many sub-Saharan African languages are categorized as tone languages and for the most part, they are classified as low resource languages due to the limited resources and tools available to process these languages. Identifying the tone associated with a syllable is therefore a key challenge for speech recognition in these languages. We propose models that automate the recognition of tones in continuous speech that can easily be incorporated into a speech recognition pipeline for these languages. We have investigated different neural architectures as well as several features extraction algorithms in speech (Filter banks, Leaf, Cestrogram, MFCC). In the context of low-resource languages, we also evaluated Wav2vec models for this task. In this work, we use a public speech recognition dataset on Yoruba. As for the results, using the Plus d'infos

PhD defense of Anais Chanclu – 11 December 2023

11 December 2023

Thesis defense of Anais Chanclu Date: Monday 11 December 2023 at 14:30  Location: Thesis room, Hannah Arendt campus. Title: Recognizing individuals by their voice: defining a scientific framework to ensure the reliability of voice comparison results in forensic contexts Jury: Abstract: In police investigations or criminal trials, voice recordings are often collected for comparison purposes with the voice of suspects. Typically, these recordings, referred to as ‘traces’, come from phone taps, emergency service calls, or voicemail messages. Recordings of suspects, known as ‘comparison pieces’, are usually obtained by law enforcement through voice sampling. Since the traces and comparison pieces were not recorded under the same conditions, and the recording conditions of the traces are often poorly known or entirely unknown, the variability between the recordings being compared cannot be quantified. Numerous factors come into play, including audio file characteristics, linguistic content, the recording environment, and the speaker(s). Voice comparison practices have evolved throughout history without conforming to a scientific framework. This has led to questioning the reliability of voice expertise (as in the Trayvon Martin case) and the use of fallacious practices (as in the Élodie Kulik case), potentially leading to judicial errors. Nowadays, the French Scientific Police (SNPS) and the Plus d'infos

Master Internship: Cyberdeception strategies using stochastic optimization and dynamic graphs

10 December 2023

General information: Context: Cyber deception is a defense strategy, complementary to conventional approaches, used to enhance the security posture of a system. The basic idea of this technique is to deliberately conceal and/or falsify a part of such system by deploying and managing decoys (e.g., “honeypots”, “honeynets”, etc.), i.e., applications, data, network elements and protocols that appear to malicious actors as a legitimate part of the system, and to which their attacks are misdirected. The advantage of an effective cyber deception strategy is twofold: on one hand, it depletes attackers’ resources while allowing system security tools to take necessary countermeasures; on the other hand, it provides valuable insights on attackers’ tactics and techniques, which can be used to improve system’s resilience to future attacks and upgrade security policies accordingly. Although cyber-deception has been successfully applied in some scenarios, existing deception approaches lack the flexibility to be seamlessly operated in highly distributed and resource-constrained environments. Indeed, if virtualization and cloud-native design approaches paved the way for ubiquitous deployment of applications, they widened the attack surface that malicious actors might exploit. In such a scenario, it is practically unfeasible to try to deploy decoys for each and every system’s service or application Plus d'infos

Master Internship: Impact of regional aggregation on energy scheduling flexibility performances

10 December 2023

Context: Large scale problems exist for the electricity system both for short-term (e.g., the Unit Commitment problem) and long-term (system planning, e.g. ”Generation Expansion Planning”). In these problems concerning the modern and future electricity system, the question of the integration of energy consumption flexibility is crucial. This flexibility, consisting in “optimally” scheduling the power profile of particular electrical appliances (the most common and suitable ones for that purpose being Electric Vehicles (EV) and Water-Heaters (WH) for residential consumers), allows obtaining a supply-demand equilibrium with diminished total system cost, in comparison to the case where only production assets are controllable. Considering flexibilities related to “small” individual con- sumers (again, EV or WH), their very large number makes it inappropriate to model them individually in the typical electricity system optimization problems, for tractability reasons: it thus seems relevant to consider an aggregate model of consumption flexibilities. In turn, the question of the “right level” of aggregation modelling is of particular importance. Aggregation/disaggregation techniques are widely studied in the context of smart grids. Objective: More precisely, the objective of this internship is to study, on a simple example, the impact of aggregation techniques and aggregation levels, and to solve an optimal energy scheduling Plus d'infos

PANG: Pattern-Based Anomaly Detection in Graphs

7 December 2023

Pang (Pattern-Based Anomaly Detection in Graphs) is an algorithm which represents and classifies a collection of graphs according to their frequent patterns (subgraphs). The detail of this algorithm are described in the below article. This work was conducted in the framework of the DeCoMaP ANR project (Detection of corruption in public procurement markets — ANR-19-CE38-0004). Plus d'infos

1 2