The next SLG meeting will be held in room S1 on Thursday, December 21st, from 12:00 PM to 1:00 PM.
We will have the pleasure of hosting St Germes BENGONO OBIANG, a PhD student in speech processing, focusing on tone recognition in under-resourced languages. He is supervised by Norbert TSOPZE and Paulin MELATAGIA from the University of Yaoundé 1, as well as by Jean-François BONASTRE and Tania JIMENEZ from LIA.
Abstract: Many sub-Saharan African languages are categorized as tone languages and for the most part, they are classified as low resource languages due to the limited resources and tools available to process these languages. Identifying the tone associated with a syllable is therefore a key challenge for speech recognition in these languages. We propose models that automate the recognition of tones in continuous speech that can easily be incorporated into a speech recognition pipeline for these languages. We have investigated different neural architectures as well as several features extraction algorithms in speech (Filter banks, Leaf, Cestrogram, MFCC). In the context of low-resource languages, we also evaluated Wav2vec models for this task. In this work, we use a public speech recognition dataset on Yoruba. As for the results, using the combination of features obtained from CS and FB, we obtain a minimum TER of 19.54% while the evaluations of the models using Wav2vec 2.0, we have a TER of 17.72% demonstrating that the use of Wav2vec provides better performance than the models used in the literature for tone identification on low-resource languages.