The most used words for women vs. men
Likelihood that certain words appear after “she” vs. “he” in screen direction.
These are the most extreme examples. There is a high likelihood that women will snuggle, giggle, squeal, and sob, relative to men. Conversely, men are more likely to strap, gallop, shoot, howl, and kill.
Let’s now examine the 800 most commonly used pronoun pairs in screen direction.
The top 800 words paired with “she” or “he”
Underlined words contain examples of their usage in screen direction.
Impact of the writer’s gender
Next, let’s examine how the writer’s gender affects characters’ behavior. Do women writers use different language for women roles? What are the words that both male and female writers use about equally when describing characters? Would results change dramatically if there were more women writers? First, we will narrow the data set to the most commonly used 400 words.
Comparing female vs. male writers
Words far away from an axis exhibit more dramatic differences. Bigger circles indicate words that are used more often.
There are some directions where the writer’s gender makes no difference. Relative to men, women gasp, hurry, smile, hesitate, and stir (mostly while cooking), regardless of whether the writer is a man or a woman. Men are consistently more likely to smash things, draw their weapons, grin, wink, point, talk, and speak.
When describing the opposite gender, both men and women use some overtly romantic and sexual words, such as “kiss” and “stroke,” as well as more subtle words including “respond” and “embrace.”
But there are differences. In our data set, 15% of film writers were women; 85% were men. Should Hollywood reach gender parity, we’d expect fewer women characters to respond, kiss, and cry. The increase in female writers would also mean women would be more likely to spy, find things, and, perhaps most remarkably, write on-screen.
Methodology
The code used in analysis is publicly available on GitHub. The data set for this analysis included 1,966 scripts for films released between 1929 and 2015; most are from 1990 and after. Each script was processed to extract only the screen directions, excluding dialogue from this analysis. We then identified all bigrams in these scripts that had either “he” or “she” as the first word in the bigram.
Then, we calculated a log odds ratio to find words that exhibit the biggest differences between relative use for “she” and “he.” We removed stop words and did some other minimal text cleaning to maintain meaningful results. We calculated the overall log odds ratio for the 800 most commonly used words, and then log odds ratios for scripts with only male writers and female writers for the 400 most commonly used words. Scripts often have more than one writer and could be counted in both categories. To learn more about text mining analyses like this one and how to perform them, check out Julia’s book.
Writers’ gender was determined via IMDB biographies, pictures, and names.
English has two singular third-person pronouns most often used for people, “he” and “she.” In this analysis, for both the text data and the identification of gender for film writers, we have chosen to identify men and women with the pronouns “he” and “she.” Using this type of classification, any writer or character associated with the pronoun “she” is classified as a woman.