This dataset consists of 3 TV series with manual annotations:
- Breaking Bad: S01–S05
- Game of Thrones: S01–08
- House of Cards: S01–S02
All three files are in .json format and contain TV Series annotated data. Each TV Series is defined by its name, A TV Series contains seasons, defined by their ids. Every season is made of episodes, defined by their ids, titles, duration and fps. Each episode contains two basic kinds of data: scenes and speech segments. Scenes are defined by starting points and are made of shots (Seasons 1 only).
A shot is defined by starting and ending positions, and recurring shot ids. The speech segments are defined by their starting and ending points; textual content (here encrypted for copyright reasons); speaker; possible interlocutors.
- URL:
- Production date: 2015–2020
- Related publication:
- Xavier Bost, Vincent Labatut et Georges Linarès. « Serial Speakers : a Dataset of
TV Series ». In : 12th Language Resources and Evaluation Conference (LREC). Marseille,
FR, 2020, p. 4249-4257. LREC ⟨hal-02477736⟩
- Xavier Bost, Vincent Labatut et Georges Linarès. « Serial Speakers : a Dataset of