This dataset contains conversations between Wikipedia editors, which are annotated in terms of various types of abuse, at the level of messages. It aligns two existing corpora:
- Messages and conversation structures of WikiConv (https://github.com/conversationai/wikidetox/tree/master/wikiconv)
- Manual annotations in toxicity of Wikipedia Comment Corpus (WCC — https://doi.org/10.6084/m9.figshare.4054689)
- URL: https://zenodo.org/doi/10.5281/zenodo.6817092
- Production date: 2019–2020
- Related publication:
- Noé Cécillon, Vincent Labatut, Richard Dufour et Georges Linarès, “WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection,” in 12th Language Resources and Evaluation Conference (LREC), 2020, pp. 1375–1383. LREC ⟨hal-02497514⟩