Date: 18th November at 2:00 pm.
Location: Thesis room, Hanna Arendt Campus (City Center).
The jury comprises:
- Mrs. B Daille and Mr. G Reyes; rapporteurs
- Mr. R Elazouzi, Mr. A Doucet, Mrs. P Sébillot, Mr. R Perez, Mr. L Meneses; examiners
- Mr. J-M Torres, and Mrs. R Wedemann; Co-directors
Abstract: In this thesis, we approach the study of creativity in general, with a particular interest in how it is created using artificial devices, and we present a more targeted and formal treatment of artificial literary text generation. In “The Creative Mind: Myths and Mechanisms” (Boden, 2004), Margaret Boden explains that the creative process is an intuitive path followed by humans to generate new artifacts appreciated for their novelty, societal significance, and beauty. She proposes a classification of creativity into three categories: — Combinatorial creativity, where known elements are merged to generate new elements; — Exploratory creativity, where generation occurs from observation or exploration; and — Transformational creativity, where generated elements result from modifications or experiments applied to objects produced by exploratory creativity.
The quest for automated processes capable of creatively generating artifacts has recently given rise to a research domain called Computational Creativity, which offers intriguing prospects in various artistic domains such as visual arts, music, and literature. Although significant advancements have been made in this field, there exist difficulties and limits related to the inherent complexity in understanding the human creative process.
Our primary objective in this study concerns Automatic Text Generation (ATG), specifically the generation of literary sentences. Hence, we address the problem of developing automatic techniques (algorithms) to generate linguistic objects that are sentences or parts of paragraphs perceived as belonging to a literary text. Most research concerning ATG avoids the literary genre due to its complexity. Some fundamental difficulties involve ambiguity in meaning and even the absence of a universal definition of what constitutes a literary text. Additionally, literary documents often refer to imaginary or allegorical worlds or situations, unlike genres dealing with the written communication of facts. These characteristics and others, such as elegance or the use of rare words in literature, make automatic generation and analysis of literary texts a complex and challenging task.
Due to the mentioned difficulties and to approach the ATG problem feasibly, we adopt a pragmatic standpoint and embrace an operational definition of what constitutes a literary sentence, based on the structure of literary corpora. We thus consider a sentence to be literary if it possesses a grammatical structure and vocabulary existing within a sufficiently large corpus considered literary by people. To achieve our goals, we collected literary texts and constructed three corpora in French, Spanish, and Portuguese, exclusively composed of literary documents like novels, short stories, narratives, theater, and poetry.
This thesis presents a novel approach to generate literary sentences. Our proposal is based on three new literary corpora we constructed, along with artificial neural network techniques, language models, and superficial syntactic analysis. Our ATG models analyze the literary corpora to extract and exploit their grammatical, semantic, and linguistic structures. We also considered the generation of rhymes (assonant and consonant), considering semantic rhyme. We proposed several manual evaluation protocols to measure the quality of the sentences generated by our literary ATG models. The results obtained are quite encouraging. Our systems generate grammatically correct and sufficiently coherent sentences, perceived as literary to a considerable extent. Moreover, these results support our assertion (our hypothesis) that it is possible to generate, from known literary sentence structures, new sentences with new semantics, while also considering the emotional significance of the original texts.