Today, research has suggested that women are
significantly less likely to make the news compared to men. In the most recent
report published by the Global Media Monitoring Project
(GMMP), the largest and longest running research on gender in the world’s news media, women were found to
make up just 24% of news subjects and sources reported. According to this report, this number has not changed
since 2010.
So, if women are underrepresented in the news to begin with, what does it look like when
women do make headlines? And how have headlines about women changed over time?
To explore these questions, we have visualized the language used in women-centered
headlines and how this language has (or has not) changed over time. Using keywords associated with the word
“woman” (like girl, mother and lady), we collected and analyzed 382,139 headlines published between 2005 and
2021 by the top English-language news publications and news agencies in four countries: The United States of
America (US), India, South Africa, and the United Kingdom (UK). A total of 186 publications were considered
(i.e. 24 publications in South Africa, 51 publications in India, 57 publications in the UK, and 54 publications
in the US).
Shown here are the 1,231 unique words most frequently used by Indian, South African,
British, and American publishers in headlines that report about women. These words are arranged here in a decreasing order of their frequency.
In our dataset, the word Black most often appeared in headlines speaking about Black
women. Black women may be part of a marginalized group in the US, the UK, and South Africa, but they
appear quite frequently in the news in these countries nonetheless. South Africa, a nation still
recovering from the impact of apartheid, is reporting about their prominent
black female doctors, pioneering black female
PhD candidates.
In India, the racial divide is not as likely to make headlines as the religious one.
Muslim appears more frequently in Indian headlines, compared to the UK,
the US, and South Africa, as local news in India often reports the religion of the story’s subject. As
in the case of Black women, nearly any event that hurts the Muslim women’s community, like acts of
violence, or helps them, like Rashida Tlaib and Ilhan Omar becoming the first
Mulsim women in Congress, makes headlines.
Speaking of firsts, the first similarity we noticed between all the countries is the
high frequency of the word first. There are thousands of mentions, nearly 8,000 to be precise, of
trailblazing women that shattered glass ceilings. In the recent past, outside Kamala Harris’ widely
covered ascension to vice-presidency, Zuzana
Čaputová became Slovakia’s first female President,
and Ana Brnabić became Serbia’s first gay and first female prime
minister.
In spite of these many firsts, the biggest similarity across headlines from the four
countries is that the most mentioned word is “man.”
Man appears more than 30,000 times.
To gather more insight into the words used to report women-centered news, we
categorized them into different themes (more on this in the method section). The first theme we noticed
is crime & violence.
There are nearly 4,000 mentions of “death,” 6,000 mentions of rape, and
more than 10,000 mentions of killing. Although there are stories
where women perpetuate crime, the majority of these mentions are found in headlines
where women
are subjected to violence. When reporting these stories, many headlines refer to a woman in their
capacity as a daughter, mother
or wife.
This brings us to another prevalent theme in the set of words: gendered language.
Many women-centered headlines use words that are explicitly gendered, such as “mother,” “waitress” or
“policewoman”, or words that become associated with a particular sex or social gender because of underlying stereotypes, such as “sexy” or “beautiful.”
For example, we found words associated with gender stereotypes, like body, child,
and marry repeatedly becoming the topics of conversation in women-centered
headlines.
In analyzing this data, we also noticed many stories of women reclaiming their
positions within headlines and refusing to remain in the margins. Stories of women, from Black, Dalit or
LGBTQIA2S+ communities, leading this reclamation in Parliaments, Congresses, High Courts, cities and
capitals pushed us to identify three more themes in addition to crime & violence and gendered
language: empowerement, people & places and, race, ethnicity, & identity.
This process of reclaiming space has also led to the reclamation of some words that
once belonged in a certain theme but do not anymore. For instance, while “slay-queen” is often used in a
derogatory manner by men to
describe women who like to show off a luxurious lifestyle, the word “slay” has also been reclaimed
by these same women as a means to describe their successes as they “slay” it.
Many such words, like “hope,” “first” or “stand,” fought off attempts to be boxed
into themes and chose to remain in the gray area. Hover over the colored
blocks or search below to explore the words and themes on your own and drop us a note if you
think you found another theme.
Finally, insights can be generated by comparing frequencies between themes. Here,
each word is as high as the number of times it is used, and stacked upon
other words in the same theme. As it turns out, for every occurrence of an empowering word, we read two
words of crime and violence.
Be it empowerment or crime & violence, headlines are designed to get the attention of the
reader. Oftentimes, headlines can inspire the reader to care about something, they can inform the reader about
important world events, or they can present the reader with shocking imagery. Below, you can explore some
headlines and see for yourself.
MAY 2015 | TELEGRAPH.CO.UK
Mum lets 6 year old daughter shave her head to prove girls don't have to be girly
NOV 2016 | DAILYMAIL.COM
Woman who gave birth on plane had no idea she was pregnant
But, is this sensationalism more or less prominent in women-focused headlines? To find
out, we used sentiment analysis, a natural language processing technique used to quantify emotion in text, to
rate every headline with a polarity score. A headline with either extremely positive or extremely negative
sentiment is, for our analysis, considered sensational, and hence it is given a higher polarity score. As it
turns out, not only is sensationalism on the rise, but it has also been consistently higher in headlines with
keywords associated with women, as opposed to headlines about other topics.
How do different news outlets compare to each other? In the chart below, the absolute and
percent difference in polarity scores between headlines about women and all other headlines is displayed for a
representative sample of the news outlets analyzed.
Most outlets sensationalize headlines about women more than other topics
Comparing 65 news outlets over 10 years
Headlines about other topics
Headlines about women
← Less Polarizing
More Polarizing →
Outlets that sensationalize headlines about women less, like Nature and FiveThirtyEight do
not regularly deal with everyday news but go deeper into significant topical trends.
Here’s how polarizing headlines are
Using data from SimilarWeb we then tied the monthly viewership of every
publication to the average polarity score of their women-centered headlines. While all outlets sensationalize
their news to some extent, news outlets on the left end of the spectrum (i.e. less sensational) tend to be the
ones who focus on either financial news, like Bloomberg in the United States and LiveMint in India, or on tech
news, such as TechRadar and CNET. Nature, a predominantly scientific publication, is the least sensational but
it also has a more limited reach.
BBC and The New York Times are the largest publications with the least sensational
headlines compared to the Daily Mail, Huffington Post, Fox News or Aaj Tak who publish more shock value
headlines.
Hover on each of the bubbles and see the headlines for yourself.
Filter by
News outlets arranged by polarity score
← Less Polarizing
More Polarizing →
Read more about our polarity calculations
We measure polarity by performing sentiment analysis on each
headline using the Vader python package, where each headline gets a sentiment score from -1 to 1 (from
more negative to more positive). Because we are interested in polarity, we take the absolute value of
each headline's score.
Here’s how biased headlines are
While the theme of crime and violence got us delving
into how sensational women-centered headlines are, the theme of gendered language led to the idea of measuring
bias.
Explicit use of gendered language in English — words like “actress,” “congresswoman” or
“landlady” — emphasizes the gender of the subject when there is no need to do so. Research from Yasmeen Hitti et al. has suggested that both gendered language
and words that reinforce societal and behavioral stereotypes, such as “beautiful,” “emotional,” “supportive”
or “dramatic,” add to the bias of a sentence. Using their research methodology, we attributed a bias score to
each headline.
Go ahead, hover on the bubbles and see for yourself if you
think these headlines are extremely gendered.
News outlets arranged by bias index
← Less Biased
More Biased →
Read more about our bias calculations
We measure gender bias by tracking the combined occurrence of
gendered language and social stereotypes usually associated with women. We do this in two steps:
1) We check if a headline contains gendered language (i.e. “spokeswoman,”
“chairwoman,” “she,” “her,” “bride,” “daughter,” “daughters,” “female,” “fiancee,” “girl,” “girlfriend”
etc.).
2) If it contains gendered language, we then count the number of words that are
considered to be social stereotypes about women (i.e. “weak,” “modest,” “virgin,” “slut,” “whore,”
“sexy,” “feminine,” “sensitive,” “emotional,” “gentle,” “soft,” “pretty,” “bitch,” “sexual” etc.).
Finally, we normalize this count for all headlines within each outlet as a score
between 0 and 1, and we aggregate (i.e. average) this score for each outlet.
Headline trends: Less gendered language. More empowering words.
In this final chart, we have visualized how the words
used in headlines about women have changed over time.
Among other trends that can be observed from this chart, we found that while the use of
many gendered words (e.g.“sexy,” “fat,” “housewife” or “gossip”) has faded out over time, the use of empowering
words has increased over time (e.g. “founder,” “activist,” “leader” or “appoint”). Other words (e.g. “death,”
“hurt,” and “drama”) have instead stood the test of time, as their use has remained consistent since 2005.
For each word’s ebb or flow, we tried to find a “remember when” memory to explain it.
Remember when Caitlyn Jenner came out as transgender? That was part of a wave of increased trans visibility that
helps to explain why “transgender” shot up in 2015. Remember when the #MeToo movement took off? That adds
context to the sharp rise of “harassment” in 2017, and the sharp rise of the word “equality” in recent years.
Such world events are arranged as bubbles in the timeline above the chart.
If you see an interesting rise, hover over one such bubble to see if you can find a world
event that can explain it. If you think that we’ve missed out on an important event in your part of the world,
let us know.
Filter by
News Events
The story of when women make headlines is, like most stories about people, full of
contradictions. It is violent, sensational, biased, hopeful and empowering although not all of them in equal
measure. This visual essay suggests that headlines used to report women-centered news can be biased and can
reinforce existing stereotypes. These headlines also tend to be more sensational than for other news topics, and
they tend to represent women in situations of crime and violence. As a growing body
of research
has already indicated,
this could imply that women are not only underrepresented in the news but also mis-represented.
Nonetheless, this visual essay also suggests that some progress has been made. Over time,
we saw that the use of many empowering words has risen sharply while the use of some gender stereotypes has
plummeted. Let’s hope this trend continues and, in the meantime, enjoy our news with a little grain of salt.
After all, when women make headlines, no words, sensational or not, biased or not, can truly explain the nuance
behind the event because words can only approximate.
Methods
To build the dataset of headlines, we scraped data from Google News, using
RapidAPI, from the
most visited publications and news agencies for readers in the US, the UK, India and South Africa according to
SimilarWeb (as of 2021-06-06). To collect this data, we
queried RapidAPI for headlines containing one or more of the following keywords: women, woman, girl, female,
lady, ladies, she, her, herself, aunt, grandmother, mother, sister, daughter, wife, mom, mum, girlfriend, mrs,
niece. As a result, our analysis encompasses 24 publications in South Africa (18,594 headlines), 51 publications
in India (138,590 headlines), 57 publications in the United Kingdom (109,286 headlines), and 54 publications in
the United States (115,669 headlines).
Gendered language and bias calculation: To categorize words used in
headlines as gendered, we manually curated two dictionaries — gendered words about women (words that are
explicitly gendered in the English language, such as “actress,” “waitress,” “congresswoman,” “landlady” or
“mother”) and words that denote societal and behavioral stereotypes about women (words like “beautiful,” “sexy,”
“pregnant,” or “emotional”). This was curated using existing research from Huimin Xu and team, published under
the title “The Cinderella Complex: Word embeddings reveal
gender stereotypes in movies and books” and the incredible research done by The
Swaddle team. These dictionaries can be found here. The
methodology used to calculate bias was borrowed from the research done by Yasmeen Hitti and team, published
under the title “Proposed Taxonomy for Gender Bias in Text.”
Theme dictionaries: To categorize words used in headlines as part of
a theme (i.e. crime and violence, empowerment, race, ethnicity and identity, people and places) we manually
curated four dictionaries. These dictionaries can be found here. In
cases where a word had more than one contextual usage (like “head” or “chair”), we only classified them inside a
theme if they belonged to that theme in no less than 90% of the cases. To analyze words and textual elements
found in headlines, we used existing Natural Language Processing packages for Python (i.e. spacy, gensim, word2number, pycontractions, bs4, unidecode, textblob, nltk).
Polarity analysis: To analyze the polarity of each headline we used
vaderSentiment. For the comparison of polarity between
women-centered headlines and all other headlines, we scraped headlines using no keyword tags from the most
visited publications and news agencies from readers in the US, the UK, India and South Africa according to SimilarWeb (as of 2021-06-06). With the use of such data,
we were able to calculate baseline polarity scores for each news publication and news agency. Though
constituting a representative sample of headlines, the number of headlines we used to calculate this baseline
polarity is roughly equal to one third of the number of headlines that we used to calculate polarity for
women-centered headlines.
With regards to the stacked bar chart (in the scrollytelling section), there were far
more than 1,231 unique words in the original dataset. For visual and readability purposes, however, we only
retained the 1,231 words that were most frequent and that were common across the four countries studied.
All of the data used for this essay is available in this Github repo.