Bibliographic Metadata

Using natural language processing to automate the Bechdel test / von Krista Westphal
Additional Titles
Automatisierung des Bechdel Tests durch Verarbeitung natürlicher Sprache
AuthorWestphal, Krista
CensorHanbury, Allan
PublishedWien, 2018
Descriptionxi, 68 Seiten : Diagramme
Institutional NoteTechnische Universität Wien, Diplomarbeit, 2018
Zusammenfassung in deutscher Sprache
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
Document typeThesis (Diplom)
Keywords (DE)Bechdel Test / Naturliche Sprachverarbeitung / Maschinelles Lernen
Keywords (EN)Bechdel Test / Natural Language Processing / Machine Learning
URNurn:nbn:at:at-ubtuw:1-108929 Persistent Identifier (URN)
 The work is publicly available
Using natural language processing to automate the Bechdel test [0.71 mb]
Abstract (English)

The Bechdel test asks three questions: does a movie contain two named female characters, do two female characters converse at some point during the movie and is there at least one conversation between female characters that is not about a man? If all questions can be answered positively, then the film passes the Bechdel test. This thesis defines and implements methods for automating the Bechdel test for screenplays and novels. Being able to automate this task would allow for large-scale analyses, permitting researchers to analyse trends over long time periods, for example, that would otherwise only be possible with time consuming manual methods. Previous research exists for automating the Bechdel test for screenplays, which provided the basis for the approach described in this thesis. Although the Bechdel test was originally formulated for movies, the questions are just as applicable to novels. However, as far as we could find, no previous research exists for automating the Bechdel test for novels. For screenplays we first parsed the text using a new rule-based approach that relies on the specialized text formatting required for screenplays. Then we identified all the characters who appeared in speaking roles and assigned each a gender by using a newly developed algorithm that incorporates census data about names and the Internet Movie Database (IMDb) information about the specific film. We also used a machine learning approach to predict if there is at least one conversation about something other than a man between the identified female characters. The results achieved for screenplays are comparable to the previous published work. Novels required a different approach than screenplays, due to the differences in structure between the two texts. For novels we used a Named-Entity Recognizer and a rule-based algorithm that connects the different names used for each character throughout the text, to identify all the characters in a novel. Using quote attribution, we then determined which character says which lines of dialogue, and so establish who converses with whom. The method developed for novels achieved perfect accuracy on a small dataset of five novels.

The PDF-Document has been downloaded 17 times.