Today, research has suggested that women are significantly less likely to make the news compared to men. In the most recent report published by the Global Media Monitoring Project (GMMP), the largest and longest running research on gender in the world’s news media, women were found to make up just 24% of news subjects and sources reported. According to this report, this number has not changed since 2010.
So, if women are underrepresented in the news to begin with, what does it look like when women do make headlines? And how have headlines about women changed over time?
To explore these questions, we have visualized the language used in women-centered headlines and how this language has (or has not) changed over time. Using keywords associated with the word “woman” (like girl, mother and lady), we collected and analyzed 382,139 headlines published between 2005 and 2021 by the top English-language news publications and news agencies in four countries: The United States of America (US), India, South Africa, and the United Kingdom (UK). A total of 186 publications were considered (i.e. 24 publications in South Africa, 51 publications in India, 57 publications in the UK, and 54 publications in the US).
Shown here are the 1,231 unique words most frequently used by Indian, South African, British, and American publishers in headlines that report about women. These words are arranged here in a decreasing order of their frequency.
In our dataset, the word Black most often appeared in headlines speaking about Black women. Black women may be part of a marginalized group in the US, the UK, and South Africa, but they appear quite frequently in the news in these countries nonetheless. South Africa, a nation still recovering from the impact of apartheid, is reporting about their prominent black female doctors, pioneering black female PhD candidates.
In India, the racial divide is not as likely to make headlines as the religious one. Muslim appears more frequently in Indian headlines, compared to the UK, the US, and South Africa, as local news in India often reports the religion of the story’s subject. As in the case of Black women, nearly any event that hurts the Muslim women’s community, like acts of violence, or helps them, like Rashida Tlaib and Ilhan Omar becoming the first Mulsim women in Congress, makes headlines.
Speaking of firsts, the first similarity we noticed between all the countries is the high frequency of the word first. There are thousands of mentions, nearly 8,000 to be precise, of trailblazing women that shattered glass ceilings. In the recent past, outside Kamala Harris’ widely covered ascension to vice-presidency, Zuzana Čaputová became Slovakia’s first female President, and Ana Brnabić became Serbia’s first gay and first female prime minister.
In spite of these many firsts, the biggest similarity across headlines from the four countries is that the most mentioned word is “man.”
Man appears more than 30,000 times.
To gather more insight into the words used to report women-centered news, we categorized them into different themes (more on this in the method section). The first theme we noticed is crime & violence.
There are nearly 4,000 mentions of “death,” 6,000 mentions of rape, and more than 10,000 mentions of killing. Although there are stories where women perpetuate crime, the majority of these mentions are found in headlines where women are subjected to violence. When reporting these stories, many headlines refer to a woman in their capacity as a daughter, mother or wife.
This brings us to another prevalent theme in the set of words: gendered language. Many women-centered headlines use words that are explicitly gendered, such as “mother,” “waitress” or “policewoman”, or words that become associated with a particular sex or social gender because of underlying stereotypes, such as “sexy” or “beautiful.”
For example, we found words associated with gender stereotypes, like body, child, and marry repeatedly becoming the topics of conversation in women-centered headlines.
In analyzing this data, we also noticed many stories of women reclaiming their positions within headlines and refusing to remain in the margins. Stories of women, from Black, Dalit or LGBTQIA2S+ communities, leading this reclamation in Parliaments, Congresses, High Courts, cities and capitals pushed us to identify three more themes in addition to crime & violence and gendered language: empowerement, people & places and, race, ethnicity, & identity.
This process of reclaiming space has also led to the reclamation of some words that once belonged in a certain theme but do not anymore. For instance, while “slay-queen” is often used in a derogatory manner by men to describe women who like to show off a luxurious lifestyle, the word “slay” has also been reclaimed by these same women as a means to describe their successes as they “slay” it.
Many such words, like “hope,” “first” or “stand,” fought off attempts to be boxed into themes and chose to remain in the gray area. Hover over the colored blocks or search below to explore the words and themes on your own and drop us a note if you think you found another theme.
Finally, insights can be generated by comparing frequencies between themes. Here, each word is as high as the number of times it is used, and stacked upon other words in the same theme. As it turns out, for every occurrence of an empowering word, we read two words of crime and violence.
Be it empowerment or crime & violence, headlines are designed to get the attention of the reader. Oftentimes, headlines can inspire the reader to care about something, they can inform the reader about important world events, or they can present the reader with shocking imagery. Below, you can explore some headlines and see for yourself.
MAY 2015 | TELEGRAPH.CO.UK
Mum lets 6 year old daughter shave her head to prove girls don't have to be girly
NOV 2016 | DAILYMAIL.COM
Woman who gave birth on plane had no idea she was pregnant
But, is this sensationalism more or less prominent in women-focused headlines? To find out, we used sentiment analysis, a natural language processing technique used to quantify emotion in text, to rate every headline with a polarity score. A headline with either extremely positive or extremely negative sentiment is, for our analysis, considered sensational, and hence it is given a higher polarity score. As it turns out, not only is sensationalism on the rise, but it has also been consistently higher in headlines with keywords associated with women, as opposed to headlines about other topics.
How do different news outlets compare to each other? In the chart below, the absolute and percent difference in polarity scores between headlines about women and all other headlines is displayed for a representative sample of the news outlets analyzed.
Most outlets sensationalize headlines about women more than other topics
Comparing 65 news outlets over 10 years
Headlines about other topics
Headlines about women
← Less Polarizing
More Polarizing →
Outlets that sensationalize headlines about women less, like Nature and FiveThirtyEight do not regularly deal with everyday news but go deeper into significant topical trends.
Here’s how polarizing headlines are
Using data from SimilarWeb we then tied the monthly viewership of every publication to the average polarity score of their women-centered headlines. While all outlets sensationalize their news to some extent, news outlets on the left end of the spectrum (i.e. less sensational) tend to be the ones who focus on either financial news, like Bloomberg in the United States and LiveMint in India, or on tech news, such as TechRadar and CNET. Nature, a predominantly scientific publication, is the least sensational but it also has a more limited reach.
BBC and The New York Times are the largest publications with the least sensational headlines compared to the Daily Mail, Huffington Post, Fox News or Aaj Tak who publish more shock value headlines.
Hover on each of the bubbles and see the headlines for yourself.
News outlets arranged by polarity score
← Less Polarizing
More Polarizing →
Read more about our polarity calculations
We measure polarity by performing sentiment analysis on each headline using the Vader python package, where each headline gets a sentiment score from -1 to 1 (from more negative to more positive). Because we are interested in polarity, we take the absolute value of each headline's score.
Here’s how biased headlines are
While the theme of crime and violence got us delving into how sensational women-centered headlines are, the theme of gendered language led to the idea of measuring bias.
Explicit use of gendered language in English — words like “actress,” “congresswoman” or “landlady” — emphasizes the gender of the subject when there is no need to do so. Research from Yasmeen Hitti et al. has suggested that both gendered language and words that reinforce societal and behavioral stereotypes, such as “beautiful,” “emotional,” “supportive” or “dramatic,” add to the bias of a sentence. Using their research methodology, we attributed a bias score to each headline.
Go ahead, hover on the bubbles and see for yourself if you think these headlines are extremely gendered.
News outlets arranged by bias index
← Less Biased
More Biased →
Read more about our bias calculations
We measure gender bias by tracking the combined occurrence of gendered language and social stereotypes usually associated with women. We do this in two steps:
1) We check if a headline contains gendered language (i.e. “spokeswoman,” “chairwoman,” “she,” “her,” “bride,” “daughter,” “daughters,” “female,” “fiancee,” “girl,” “girlfriend” etc.).
2) If it contains gendered language, we then count the number of words that are considered to be social stereotypes about women (i.e. “weak,” “modest,” “virgin,” “slut,” “whore,” “sexy,” “feminine,” “sensitive,” “emotional,” “gentle,” “soft,” “pretty,” “bitch,” “sexual” etc.).
Finally, we normalize this count for all headlines within each outlet as a score between 0 and 1, and we aggregate (i.e. average) this score for each outlet.
Headline trends: Less gendered language. More empowering words.
In this final chart, we have visualized how the words used in headlines about women have changed over time.
Among other trends that can be observed from this chart, we found that while the use of many gendered words (e.g.“sexy,” “fat,” “housewife” or “gossip”) has faded out over time, the use of empowering words has increased over time (e.g. “founder,” “activist,” “leader” or “appoint”). Other words (e.g. “death,” “hurt,” and “drama”) have instead stood the test of time, as their use has remained consistent since 2005.
For each word’s ebb or flow, we tried to find a “remember when” memory to explain it. Remember when Caitlyn Jenner came out as transgender? That was part of a wave of increased trans visibility that helps to explain why “transgender” shot up in 2015. Remember when the #MeToo movement took off? That adds context to the sharp rise of “harassment” in 2017, and the sharp rise of the word “equality” in recent years. Such world events are arranged as bubbles in the timeline above the chart.
If you see an interesting rise, hover over one such bubble to see if you can find a world event that can explain it. If you think that we’ve missed out on an important event in your part of the world, let us know.
The story of when women make headlines is, like most stories about people, full of contradictions. It is violent, sensational, biased, hopeful and empowering although not all of them in equal measure. This visual essay suggests that headlines used to report women-centered news can be biased and can reinforce existing stereotypes. These headlines also tend to be more sensational than for other news topics, and they tend to represent women in situations of crime and violence. As a growing body of research has already indicated, this could imply that women are not only underrepresented in the news but also mis-represented.
Nonetheless, this visual essay also suggests that some progress has been made. Over time, we saw that the use of many empowering words has risen sharply while the use of some gender stereotypes has plummeted. Let’s hope this trend continues and, in the meantime, enjoy our news with a little grain of salt. After all, when women make headlines, no words, sensational or not, biased or not, can truly explain the nuance behind the event because words can only approximate.
To build the dataset of headlines, we scraped data from Google News, using RapidAPI, from the most visited publications and news agencies for readers in the US, the UK, India and South Africa according to SimilarWeb (as of 2021-06-06). To collect this data, we queried RapidAPI for headlines containing one or more of the following keywords: women, woman, girl, female, lady, ladies, she, her, herself, aunt, grandmother, mother, sister, daughter, wife, mom, mum, girlfriend, mrs, niece. As a result, our analysis encompasses 24 publications in South Africa (18,594 headlines), 51 publications in India (138,590 headlines), 57 publications in the United Kingdom (109,286 headlines), and 54 publications in the United States (115,669 headlines).
Gendered language and bias calculation: To categorize words used in headlines as gendered, we manually curated two dictionaries — gendered words about women (words that are explicitly gendered in the English language, such as “actress,” “waitress,” “congresswoman,” “landlady” or “mother”) and words that denote societal and behavioral stereotypes about women (words like “beautiful,” “sexy,” “pregnant,” or “emotional”). This was curated using existing research from Huimin Xu and team, published under the title “The Cinderella Complex: Word embeddings reveal gender stereotypes in movies and books” and the incredible research done by The Swaddle team. These dictionaries can be found here. The methodology used to calculate bias was borrowed from the research done by Yasmeen Hitti and team, published under the title “Proposed Taxonomy for Gender Bias in Text.”
Theme dictionaries: To categorize words used in headlines as part of a theme (i.e. crime and violence, empowerment, race, ethnicity and identity, people and places) we manually curated four dictionaries. These dictionaries can be found here. In cases where a word had more than one contextual usage (like “head” or “chair”), we only classified them inside a theme if they belonged to that theme in no less than 90% of the cases. To analyze words and textual elements found in headlines, we used existing Natural Language Processing packages for Python (i.e. spacy, gensim, word2number, pycontractions, bs4, unidecode, textblob, nltk).
Polarity analysis: To analyze the polarity of each headline we used vaderSentiment. For the comparison of polarity between women-centered headlines and all other headlines, we scraped headlines using no keyword tags from the most visited publications and news agencies from readers in the US, the UK, India and South Africa according to SimilarWeb (as of 2021-06-06). With the use of such data, we were able to calculate baseline polarity scores for each news publication and news agency. Though constituting a representative sample of headlines, the number of headlines we used to calculate this baseline polarity is roughly equal to one third of the number of headlines that we used to calculate polarity for women-centered headlines.
With regards to the stacked bar chart (in the scrollytelling section), there were far more than 1,231 unique words in the original dataset. For visual and readability purposes, however, we only retained the 1,231 words that were most frequent and that were common across the four countries studied.