Serial Speakers – Collection of Annotated TV Serials

This dataset consists of 3 TV series with manual annotations:

  1. Breaking Bad: S01–S05
  2. Game of Thrones: S01–08
  3. House of Cards: S01–S02

All three files are in .json format and contain TV Series annotated data. Each TV Series is defined by its name, A TV Series contains seasons, defined by their ids. Every season is made of episodes, defined by their ids, titles, duration and fps. Each episode contains two basic kinds of data: scenes and speech segments. Scenes are defined by starting points and are made of shots (Seasons 1 only).
A shot is defined by starting and ending positions, and recurring shot ids. The speech segments are defined by their starting and ending points; textual content (here encrypted for copyright reasons); speaker; possible interlocutors.

  • URL: https://zenodo.org/doi/10.5281/zenodo.6815775
  • Production date: 2015–2020
  • Related publication:
    • Xavier Bost, Vincent Labatut et Georges Linarès. « Serial Speakers : a Dataset of
      TV Series ». In : 12th Language Resources and Evaluation Conference (LREC). Marseille,
      FR, 2020, p. 4249-4257. LREChal-02477736