When Women Make Headlines

A visual essay about the (mis)representation of women in the news

Today, research has suggested that women are significantly less likely to make the news compared to men. In the most recent report published by the Global Media Monitoring Project (GMMP), the largest and longest running research on gender in the world’s news media, women were found to make up just 24% of news subjects and sources reported. According to this report, this number has not changed since 2010.

In the context of news, headlines introduce, frame and contextualize a news story. Furthermore, research within the fields of educational and experimental psychology has demonstrated that news headlines can have a disproportionate impact on the reader’s mind, and that misleading headlines can bias readers toward a specific interpretation.

So, if women are underrepresented in the news to begin with, what does it look like when women do make headlines? And how have headlines about women changed over time?

To explore these questions, we have visualized the language used in women-centered headlines and how this language has (or has not) changed over time. Using keywords associated with the word “woman” (like girl, mother and lady), we collected and analyzed 382,139 headlines published between 2005 and 2021 by the top English-language news publications and news agencies in four countries: The United States of America (US), India, South Africa, and the United Kingdom (UK). A total of 186 publications were considered (i.e. 24 publications in South Africa, 51 publications in India, 57 publications in the UK, and 54 publications in the US).

Shown here are the 1,231 unique words most frequently used by Indian, South African, British, and American publishers in headlines that report about women. These words are arranged here in a decreasing order of their frequency.

The same word appears at different ranks in different countries. For example, US headlines use the word sue significantly more than those from India, South Africa, or the UK, where suing may be a less common practice. “Suing” someone in an American court can also gain international attention, like when two women sued Jeffrey Epstein’s estate over alleged sexual abuse and when Rihanna sued her father for “exploiting her name.”

In our dataset, the word Black most often appeared in headlines speaking about Black women. Black women may be part of a marginalized group in the US, the UK, and South Africa, but they appear quite frequently in the news in these countries nonetheless. South Africa, a nation still recovering from the impact of apartheid, is reporting about their prominent black female doctors, pioneering black female PhD candidates.

In India, the racial divide is not as likely to make headlines as the religious one. Muslim appears more frequently in Indian headlines, compared to the UK, the US, and South Africa, as local news in India often reports the religion of the story’s subject. As in the case of Black women, nearly any event that hurts the Muslim women’s community, like acts of violence, or helps them, like Rashida Tlaib and Ilhan Omar becoming the first Mulsim women in Congress, makes headlines.

Speaking of firsts, the first similarity we noticed between all the countries is the high frequency of the word first. There are thousands of mentions, nearly 8,000 to be precise, of trailblazing women that shattered glass ceilings. In the recent past, outside Kamala Harris’ widely covered ascension to vice-presidency, Zuzana Čaputová became Slovakia’s first female President, and Ana Brnabić became Serbia’s first gay and first female prime minister.

Flip a page or scroll a little more and you will see women from different races, ethnicities, and identities, becoming the first to lead a US Marine Corps tank platoon, win an individual Olympic medal as an African-american female swimmer, fly an F-35 fighter jet, drive cars in a country where it was once forbidden and most of all, raise their voices.

In spite of these many firsts, the biggest similarity across headlines from the four countries is that the most mentioned word is “man.”

Man appears more than 30,000 times.

To gather more insight into the words used to report women-centered news, we categorized them into different themes (more on this in the method section). The first theme we noticed is crime & violence.

There are nearly 4,000 mentions of “death,” 6,000 mentions of rape, and more than 10,000 mentions of killing. Although there are stories where women perpetuate crime, the majority of these mentions are found in headlines where women are subjected to violence. When reporting these stories, many headlines refer to a woman in their capacity as a daughter, mother or wife.

This brings us to another prevalent theme in the set of words: gendered language. Many women-centered headlines use words that are explicitly gendered, such as “mother,” “waitress” or “policewoman”, or words that become associated with a particular sex or social gender because of underlying stereotypes, such as “sexy” or “beautiful.”

For example, we found words associated with gender stereotypes, like body, child, and marry repeatedly becoming the topics of conversation in women-centered headlines.

In analyzing this data, we also noticed many stories of women reclaiming their positions within headlines and refusing to remain in the margins. Stories of women, from Black, Dalit or LGBTQIA2S+ communities, leading this reclamation in Parliaments, Congresses, High Courts, cities and capitals pushed us to identify three more themes in addition to crime & violence and gendered language: empowerement, people & places and, race, ethnicity, & identity.

This process of reclaiming space has also led to the reclamation of some words that once belonged in a certain theme but do not anymore. For instance, while “slay-queen” is often used in a derogatory manner by men to describe women who like to show off a luxurious lifestyle, the word “slay” has also been reclaimed by these same women as a means to describe their successes as they “slay” it.

Many such words, like “hope,” “first” or “stand,” fought off attempts to be boxed into themes and chose to remain in the gray area. Hover over the colored blocks or search below to explore the words and themes on your own and drop us a note if you think you found another theme.

Finally, insights can be generated by comparing frequencies between themes. Here, each word is as high as the number of times it is used, and stacked upon other words in the same theme. As it turns out, for every occurrence of an empowering word, we read two words of crime and violence.

Be it empowerment or crime & violence, headlines are designed to get the attention of the reader. Oftentimes, headlines can inspire the reader to care about something, they can inform the reader about important world events, or they can present the reader with shocking imagery. Below, you can explore some headlines and see for yourself.

MAY 2015 | TELEGRAPH.CO.UK

Mum lets 6 year old daughter shave her head to prove girls don't have to be girly

NOV 2016 | DAILYMAIL.COM

Woman who gave birth on plane had no idea she was pregnant

APR 2016 | FOXNEWS.COM

More single women hunt for homes, not husbands

Sensationalism is an editorial tactic. The words in a sensational headline are carefully chosen for the purpose of enticing the reader. For example, a story titled “Woman, 28, charged after 'stabbing' in Barnsley town centre” could potentially garner a semi-conscious glance from its reader while one titled “Woman screamed 'KILL KILL KILL!' while repeatedly stabbing a man in Barnsley market attack” could be more likely to capture the reader’s attention.

But, is this sensationalism more or less prominent in women-focused headlines? To find out, we used sentiment analysis, a natural language processing technique used to quantify emotion in text, to rate every headline with a polarity score. A headline with either extremely positive or extremely negative sentiment is, for our analysis, considered sensational, and hence it is given a higher polarity score. As it turns out, not only is sensationalism on the rise, but it has also been consistently higher in headlines with keywords associated with women, as opposed to headlines about other topics.

How do different news outlets compare to each other? In the chart below, the absolute and percent difference in polarity scores between headlines about women and all other headlines is displayed for a representative sample of the news outlets analyzed.

Most outlets sensationalize headlines about women more than other topics

Comparing 65 news outlets over 10 years

Headlines about other topics

Headlines about women

← Less Polarizing

More Polarizing →

Outlets that sensationalize headlines about women less, like Nature and FiveThirtyEight do not regularly deal with everyday news but go deeper into significant topical trends.

Using data from SimilarWeb we then tied the monthly viewership of every publication to the average polarity score of their women-centered headlines. While all outlets sensationalize their news to some extent, news outlets on the left end of the spectrum (i.e. less sensational) tend to be the ones who focus on either financial news, like Bloomberg in the United States and LiveMint in India, or on tech news, such as TechRadar and CNET. Nature, a predominantly scientific publication, is the least sensational but it also has a more limited reach.

BBC and The New York Times are the largest publications with the least sensational headlines compared to the Daily Mail, Huffington Post, Fox News or Aaj Tak who publish more shock value headlines.

Hover on each of the bubbles and see the headlines for yourself.

Filter by

Filter by country Filter by publication

News outlets arranged by polarity score

← Less Polarizing

More Polarizing →

Read more about our polarity calculations

We measure polarity by performing sentiment analysis on each headline using the Vader python package, where each headline gets a sentiment score from -1 to 1 (from more negative to more positive). Because we are interested in polarity, we take the absolute value of each headline's score.

While the theme of crime and violence got us delving into how sensational women-centered headlines are, the theme of gendered language led to the idea of measuring bias.

Explicit use of gendered language in English — words like “actress,” “congresswoman” or “landlady” — emphasizes the gender of the subject when there is no need to do so. Research from Yasmeen Hitti et al. has suggested that both gendered language and words that reinforce societal and behavioral stereotypes, such as “beautiful,” “emotional,” “supportive” or “dramatic,” add to the bias of a sentence. Using their research methodology, we attributed a bias score to each headline.

For example, the headline that reads “Daughter in emotional meeting with woman given life back by selfless courage of her dead mother” gets a higher bias score than the headline that reads “Hillary Clinton speaks out for the same American values upheld in retracted embassy statement.” In the chart below, we visualize this bias index for each publication. In contrast to our results for polarity, there is a greater variance in bias scores across publications. The Daily Mail scores the highest while the BBC and ESPN are among those who score the lowest.

Go ahead, hover on the bubbles and see for yourself if you think these headlines are extremely gendered.

News outlets arranged by bias index

← Less Biased

More Biased →

Methods

To build the dataset of headlines, we scraped data from Google News, using RapidAPI, from the most visited publications and news agencies for readers in the US, the UK, India and South Africa according to SimilarWeb (as of 2021-06-06). To collect this data, we queried RapidAPI for headlines containing one or more of the following keywords: women, woman, girl, female, lady, ladies, she, her, herself, aunt, grandmother, mother, sister, daughter, wife, mom, mum, girlfriend, mrs, niece. As a result, our analysis encompasses 24 publications in South Africa (18,594 headlines), 51 publications in India (138,590 headlines), 57 publications in the United Kingdom (109,286 headlines), and 54 publications in the United States (115,669 headlines).

Gendered language and bias calculation: To categorize words used in headlines as gendered, we manually curated two dictionaries — gendered words about women (words that are explicitly gendered in the English language, such as “actress,” “waitress,” “congresswoman,” “landlady” or “mother”) and words that denote societal and behavioral stereotypes about women (words like “beautiful,” “sexy,” “pregnant,” or “emotional”). This was curated using existing research from Huimin Xu and team, published under the title “The Cinderella Complex: Word embeddings reveal gender stereotypes in movies and books” and the incredible research done by The Swaddle team. These dictionaries can be found here. The methodology used to calculate bias was borrowed from the research done by Yasmeen Hitti and team, published under the title “Proposed Taxonomy for Gender Bias in Text.”

Theme dictionaries: To categorize words used in headlines as part of a theme (i.e. crime and violence, empowerment, race, ethnicity and identity, people and places) we manually curated four dictionaries. These dictionaries can be found here. In cases where a word had more than one contextual usage (like “head” or “chair”), we only classified them inside a theme if they belonged to that theme in no less than 90% of the cases. To analyze words and textual elements found in headlines, we used existing Natural Language Processing packages for Python (i.e. spacy, gensim, word2number, pycontractions, bs4, unidecode, textblob, nltk).

Polarity analysis: To analyze the polarity of each headline we used vaderSentiment. For the comparison of polarity between women-centered headlines and all other headlines, we scraped headlines using no keyword tags from the most visited publications and news agencies from readers in the US, the UK, India and South Africa according to SimilarWeb (as of 2021-06-06). With the use of such data, we were able to calculate baseline polarity scores for each news publication and news agency. Though constituting a representative sample of headlines, the number of headlines we used to calculate this baseline polarity is roughly equal to one third of the number of headlines that we used to calculate polarity for women-centered headlines.

With regards to the stacked bar chart (in the scrollytelling section), there were far more than 1,231 unique words in the original dataset. For visual and readability purposes, however, we only retained the 1,231 words that were most frequent and that were common across the four countries studied.

All of the data used for this essay is available in this Github repo.

We collaborated with Jan Diehm, Rob Smith, Russell Samora, and Michelle McGhee for the piece and we’re quite grateful and happy for how it turned out!

When Women Make Headlines

A visual essay about the (mis)representation of women in the news

Headlines about women are more sensational

Here’s how polarizing headlines are

Here’s how biased headlines are

Headline trends: Less gendered language. More empowering words.

Methods