DrBERT – French Biomedical Language Model

21 December 2023

DrBERT is a state-of-the-art language model for French Biomedical based on the RoBERTa architecture pretrained on the French Biomedical corpus NACHOS. DrBERT was assessed on 11 distinct practical biomedical applications for French language, including named entity recognition (NER), part-of-speech tagging (POS), binary/multi-class/multi-label classification, and multiple-choice question answering. The outcomes revealed that DrBERT enhanced the performance of most tasks compared to prior techniques, indicating that from-scratch pre-trained strategy is still the most effective for BERT language models on French Biomedical. DrBERT was trained and evaluated by Yanis Labrak (LIA, Zenidoc), Adrien Bazoge (LS2N), Richard Dufour (LS2N), Mickael Rouvier (LIA), Emmanuel Morin (LS2N), Béatrice Daille (LS2N) and Pierre-Antoine Gourraud (Nantes University). Website: https://drbert.univ-avignon.fr/