PhD defense of Sondes Abderrazek – 2 May 2023 – Laboratoire Informatique d’Avignon

Date: 2nd of May at 2:00 pm.

Place: Avignon at the Centre d’Enseignement et de Recherche en Informatique (Ada Lovelace Auditorium)

The jury members:

PR. HENNEBERT Jean, HEIA-FR, HES-SO, Université de Fribourg (Rapporteur)
PR. LOLIVE Damien, IRISA, ENSSAT, Université de Rennes (Rapporteur)
PR. TRANCOSO Isabel, INESC-ID, IST, Université de Lisbonne (Examinatrice)
PH. WOISARD Virginie, Praticien Hospitalier, Professeur Associé, CHU de toulouse, Université de Toulouse (Examinatrice)
PR. LARCHER Anthony, LIUM, Le Mans Université (Examinateur)
PR. BONASTRE Jean-François, LIA, Université d’Avignon, INRIA (Examinateur)
PR. FREDOUILLE Corinne, LIA, Université d’Avignon (Directrice de thèse)

Title: Assessment of Speech Intelligibility using Deep Learning: Towards Enhanced Interpretability in Clinical Phonetics.

Abstract: Speech intelligibility is an essential component of effective communication. It refers to the degree to which a speaker’s intended message can be understood by a listener. This capacity can be hampered as a consequence of speech disorders, which results in a reduced quality of life for individuals. In the case of Head and Neck Cancer (HNC), speech may be affected due to the presence of tumors in the speech production system, but the main cause of speech impairment is typically the tumor treatment including surgery, radiotherapy, chemotherapy, or a combination of these treatments. In such cases, the evaluation of speech quality is crucial to assess the communication deficit of patients and develop targeted treatment plans. In clinical practice, perceptual measures are considered the gold standard for assessing speech disorders. Although these measures are widely used, they suffer from several limitations, the most important of which is their subjectivity. Consequently, the automatic assessment of speech disorders has emerged as a promising alternative to perceptual measures since the 90s.

In this thesis, we explore the potential of deep learning (DL) techniques to evaluate speech disorders while addressing the shortcomings of existing tools. In this sensitive clinical context where the stakes are high and trust is paramount, we consider the explainability and interpretability of DL tools as requirements rather than optional features. Therefore, we propose a three-step methodology based on deep learning and dedicated to an interpretable assessment of speech intelligibility in the context of speech disorders. In the first step, we tackle a major issue in the current automatic tools dedicated to disordered speech assessment which is the limited insight into the relationship between speech disorders and the resulting assessment. To this end, we implement a DL-based model, trained on healthy speech and dedicated to an intermediate task which is French phoneme classification. This methodological choice serves two purposes. The first is to take advantage of the phoneme-level knowledge obtained from the classification task to answer the major problem mentioned above. That is, it will enable the provision of insightful information about the final assessment score at the phoneme level in a subsequent stage. The second is related to the use of healthy (normal) speech. Indeed, this allows overcoming the very limited amount of pathological data available while meeting the high data quantity requirements of deep learning. In the second step, the primary objective is to guarantee the interpretability of the developed solution, thereby ensuring its acceptance within the clinical practice setting. Thus, we investigate the capacity of the implemented phoneme classifier in yielding relevant knowledge related to the characteristics of speech pathology. We then propose Neuro-based Concept Detector (NCD), our general analytic framework for the explainability of the deep representations of a DL-based model. This framework highlights, within the classification model resulting from the first step, a representation of the acoustic and articulatory characteristics of healthy speech in terms of phonetic features, easily interpretable in terms of alterations in the case of speech disorders. We, therefore, hit two targets with one shot through this methodological choice. Indeed, not only do we actively take steps to mitigate the impact of the black-box nature of DL models, but also we ensure an additional level of granularity that clinicians can use to link and interpret the final intelligibility assessment.

Finally, the third step is dedicated to the prediction of a final score assessing the speech intelligibility of a person. This step is based on the different levels of representation provided by the two previous steps, allowing to relate the predicted intelligibility score to the degree of speech alteration at the phoneme and phonetic feature levels.

The overall proposed methodology thus provides an interpretation of the speech assessment score in the field of phonetics for clinicians. The promising results obtained on a population of HNC patients suggest the potential of such a methodology to monitor the progress of therapy or to develop tailored rehabilitation protocols that would improve the patient’s ability to communicate effectively, leading consequently to improved quality of life. The validation of this methodology in clinical practice is one of the many perspectives of this thesis.