What Does the Path to Fame Look Like?
Data and Methods
Data for this story were collected and processed using the Wikipedia API. The period of collection was from July 1, 2015–September 13, 2018, from English Wikipedia. Any person who appeared in the top 1,000 pages for at least one day in that range was considered. The full source is on Github.
Wikipedia’s aggregation of notable births was the starting point to decide who is thought to be a celebrity or not, a foundation of over 40,000 people. Additionally, each top 1,000 page with “(born” in the text, which is a consistent characteristic of people pages on Wikipedia, was also added to this database of people, to ensure no one not yet notable enough to be added to the births pages was still included.
We started with those who had little to no pageviews in the second half of 2015, eliminating already known celebrities. The methodology to define rising celebrities was centered on a series of levels of sustained pageviews. Levels were assigned to based on monthly averages. There were eight levels (like Karate belts): (1) 50, (2) 100, (3) 200, (4) 500, (5) 1,000, (6) 2,000, (7) 5,000, and (8) 10,000, pageviews. If someone hit a new level of pageviews and never dipped below that level’s threshold again, they were assigned the level, hence the term “sustained pageviews.” If a person hit level 5 (1,000 pageviews), for example, but then dropped below 1,000 pageviews the following month, they would still be a level 4.
After assigning levels, anyone with 1. a beginning level lower than 4, 2. a level change of more than 4 levels, and 3. less than level 2 in 2015, was included in the final list. People above level 6 were considered those who have risen to fame. Anyone who satisfied those parameters but still remained below level 6 was considered rising.
By Russell Samora and Caitlyn Ralph. For questions, comments, etc., sup@pudding.cool.