WAC – Wikipedia Abusive Conversations

This dataset contains conversations between Wikipedia editors, which are annotated in terms of various types of abuse, at the level of messages. It aligns two existing corpora:

  1. Messages and conversation structures of WikiConv (https://github.com/conversationai/wikidetox/tree/master/wikiconv)
  2. Manual annotations in toxicity of Wikipedia Comment Corpus (WCC — https://doi.org/10.6084/m9.figshare.4054689)
  • URL: https://zenodo.org/doi/10.5281/zenodo.6817092
  • Production date: 2019–2020
  • Related publication:
    • Noé Cécillon, Vincent Labatut, Richard Dufour et Georges Linarès, “WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection,” in 12th Language Resources and Evaluation Conference (LREC), 2020, pp. 1375–1383. LREChal-02497514