Jeanette Rankin was the first woman elected to the House of Representatives on November 7, 1916. Since then, 289 different women have represented their constituents in the House, bringing the House to 19% women. In this article, we follow their stories through data and identify their contributions with the help of machine learning.
Let's draw a line for each woman who has served in the House of Representatives. For example, this line is for the longest-serving woman House Member, Democrat Marcy Kaptur, who represents Ohio's 9th District.
Here are all the women who have served in the House of Representatives. Democrats and Republicans are represented by their usual colors. Let's look at a few prominent women.
Visualized is the percent of women elected to in other national government bodies (among OECD member countries). In Norway, this is the Stortinget. In Mexico, it’s the Cámara de Diputados. For the United Kingdom, it’s the House of Commons.
The Nordic countries lead in equal gender representation, although Mexico has also made great strides. Many of these countries, such as Norway, Sweden, and Mexico have also implemented gender quotas to improve representation, although Iceland, notably, has not. None of these countries have made it to gender parity. In fact, the only two countries in the world to meet or exceed 50% women in governing bodies are Bolivia and Rwanda (not visualized here), both of which also have gender quotas.
The US, on the other hand, ranks 28th out of 34.
With gender parity, constituents might hope that elected officials focus on more inclusive issues (as, for example, Patsy Mink did with Title IX legislation). To better understand how a gender shift in the House might affect the topics debated by our representatives, consider the speeches made by House members.
From the Congressional Record, we compiled a collection of 500,000 speeches delivered by men and women in the House from 1994 to the present, including monologues, debates, single-sentence replies, and even acknowledgments praising particular individuals.
We then used a probabilistic machine learning model1While there are a number of different machine learning techniques that can do this, including neural networks, the beauty of Latent Dirichlet Allocation (LDA) is that it is an unsupervised learning algorithm. Unlike most other algorithms, it does not require pre-labeled speeches to train the model, which would bias the inference process by selecting the topics in advance. The only necessary inputs are the corpus of speeches and a rough guess of the total number of topics within the corpus. to help us infer the topics within a speech, looking for words that often appear together and clustering them into the same group2For example, suppose we choose a small number of topics (e.g., 10), the model will identify the 10 most overarching themes in congressional speeches. We wanted to identify more subtle topics and themes, and after a bit of experimentation, we settled on 75 topics. As is often the case with LDA models, many of the topics identified were nonsensical, but the majority were well-defined. (eg. energy policy, jobs, economy, taxes, health care, etc.)
|Topic 1||Topic 2|
|renewable energy||natural disaster|
The result: a rough calculation for the fraction of time each representative has spent on an issue.
Here is the topic “energy policy”. Each circle in this chart represents a particular House member: men on the left and women on the right. The higher the circle, the greater the proportion of their time in the House was spent talking about energy policy.
We can represent the difference between the median amount spent by men versus women with a line.
Men and women representatives appear to devote roughly the same amount of their speeches to energy policy. Next, let’s examine another topic, one on which women focus more than men: “healthcare”.
While healthcare is an important topic for all representatives, women spent more than twice the amount of their speeches discussing healthcare and medical issues than men.
Even when accounting for party, the result is almost exactly the same: Republican women are just as far apart from Republican men as Democratic women are from Democratic men1View the breakdown by party here..
Now let's examine a topic favored by men: “government budget”
With the government budget, men spent a greater portion of their speeches on budget and taxes in comparison to women (although the difference isn't as extreme as with healthcare).
Let's take the median difference between men and women for government budget (depicted as a line) and plot it against all other topics.
A difference of +100% (e.g., as is the case with “child welfare”) means that women are twice as likely as men to refer to the topic in a speech. A difference of -100% means the opposite: men are twice as likely as women to refer to the topic.
Women representatives focus more of their speeches on healthcare, civil rights, labor rights and the environment.
Men focus more of their speeches on the military, agriculture, and the budget.
Select a topic to examine its data more closely.
In the upcoming 2018 midterm elections, a record number of women have filed to (or are planning to) run for a House seat (as of June 2018) and there is still time to file. Democratic women candidates in particular are running in record numbers, but Republican women candidates are also on the rise. The outcome has the potential to dramatically shift the gender breakdown of the House.
You may find all the code for this project on Github.
Data on women representatives and on total number of representatives is from Wikipedia.
Data on women in parliaments across the world is from the Inter-parliamentary Union and can be accessed on the World Bank's data repository.
Speech transcripts are from the Congressional Record, scraped with the help of this awesome tool.
Data on women candidates running for Congress is from the Center for American Women and Politics, Eagleton Institute of Politics, Rutgers University.
Finally, this work wouldn't be possible without the help of various machine learning libraries such as spaCy and gensim, data libaries such as pandas, visualisation libraries such as D3 and the python and R ecosystems.