PhD Defense – Manh Tuan NGUYEN – 25/03/2026 – Laboratoire Informatique d’Avignon

Date: Wednesday, March 25th at 2:00 PM

Place: amphitheater Blaise at CERI.

The presentation will be held in English.

The jury members:

M. Nicolas AUDIBERT, MCF/HDR, LPP – Université Sorbonne Nouvelle (Rapporteur)
M. Benjamin LECOUTEUX, PR, LIG/GETALP – Université Grenoble Alpes (Rapporteur)
M. Julien PINQUIER, PR, IRIT – Université de Toulouse (Examinateur)
M. Yannick ESTÈVE, PR, LIA – Avignon Université (Examinateur)
Mme Virginie WOISARD, PH/PA, CHU de Toulouse – Université de Toulouse (Examinatrice)
Mme Corinne FREDOUILLE, PR, LIA – Avignon Université (Directrice de thèse)

Title: Considering the inter-judge variability in the perceptual evaluation of speech and voice disorders and its integration into an automatic decision support system

Abstract:
Perceptual judgments are widely used in domains that lack clear objective criteria or reliable measurement methods, requiring reliance on human expert evaluation. However, such judgments are inherently subjective, often leading to a lack of agreement and, consequently, variability when multiple experts assess the same material. This variability, referred to as inter-rater variability, is typically addressed by aggregating scores or applying majority voting to produce a consensus decision. While effective for obtaining a final decision, this approach leaves the underlying causes of inter-rater variability largely unexplored.
This thesis aims to explain inter-rater variability rather than treating consensus decisions as an absolute reference. We argue that such variability may arise from systematic differences between experts, particularly in terms of professional background, training, and the perceptual dimensions emphasized during assessment. By reducing individual judgments to a single consensus score, traditional approaches implicitly discard valuable information about expert reasoning and decision-making strategies. Understanding and explaining this variability is therefore essential.
To address this objective, we propose a computational approach to model and interpret inter-rater variability. Leveraging the pattern-recognition capabilities of modern AI systems, we train models on perceptual data from individual experts, with the goal of capturing the perceptual dimensions each expert relies upon. By applying explainability methods to these expert-specific models, we seek to indirectly uncover the decision-making processes underlying human judgments.
Clinical speech assessment is chosen as the experimental testbed for this approach. In this context, we develop systems based on pre-trained speech representation models (Wav2Vec 2.0) trained to reproduce individual expert decisions. We then apply interpretability analyses to build expert-specific profiles. These profiles reveal systematic differences in how experts weight speech dimensions such as articulation, voice quality, and prosody. Overall, this work demonstrates that computational methods can identify and quantify the perceptual dimensions underlying expert judgments, transforming inter-rater variability from mere measurement noise into meaningful information about assessment strategies. The proposed methodology also shows strong potential for transfer to other domains involving subjective evaluations of multidimensional phenomena. Furthermore, it opens perspectives for integrating this knowledge into an automatic decision support system.