Data & Methods
An interactive explainer for the choices in
A Map of Places with the Same Name
Introduction
Explanation
We chose to blend three properties to match our perception of how this plays out in reality:
Distance: the proximity to a place.
Population: the size of a place is a pretty good indicator of its importance.
Wikipedia article length: an extra layer of nuance for the cultural importance/awareness of places that may not have a large population.
Value
Explanation
There are a few different metrics we computed to see what matched our intuition the best:
score: the raw numerical score as calculated according to the three properties.
share: the score as a percent of the sum of all place scores for each county.
shareDelta: how much bigger the highest score is than the next highest.
Score on its own looks at things in isolation so that felt too siloed. When then tried share because it accounted for the effects of adding more places to the mix and giving each a bit of gravity. We ended up using shareDelta because of the small nuance it added. There are often times when two or more place shares are quite high, but since we only pick the top one the others are drowned out. By looking at how much higher the top one is, it better reflects the likelihood of one place over another.
Scale relative to counties basically means to look at that value compared to all results or independently. Say for example The maximum share for any county is 40. We would then normalize all county values to that. So a share of 20 would actually mean a value of 50%.
We chose to weight distance more heavily because we wanted to make it so if you are in or near the place you are referring to, it should always win, regardless of how big an external place is. We chose a relative comparison because really high place counts (like Washington) dilute the absolute results.
Opacity Threshold
Explanation
Based on the resulting value from above, which ones get the full opacity (probably) and which get reduced opacity (maybe). The right values here are the most driven by feel, and vary greatly depending on which metrics you select above and if the scores are relative or not.
Distance Scale
Explanation
We chose a logarithmic scale because it felt most accurate: if you are really close to a place its gravity should be felt more heavily than if you get farther away. Anything under the minimum (we chose 50 miles) gets the highest score (1), anything over the maximum (300 miles) gets 0.
Population Scale
Explanation
Like distance, we chose to clamp the maximum value here. At a certain size, there is really little distinction. We chose to use a computed threshold as our max clamp of the top 0.1% percentile. We also chose an exponential scale to curb the effects of the lopsided distribution, so the medium-sized places still have some weight and aren’t lumped with the places with very low numbers.
Wikipedia Article Scale
Explanation
Like distance, we chose to clamp the maximum value here. At a certain size, there is really little distinction. We chose to use a computed threshold as our max clamp of the top 0.1% percentile. We also chose an exponential scale to curb the effects of the lopsided distribution, so the medium-sized places still have some weight and aren’t lumped with the places with very low numbers.
Miscellaneous Details
Explanation
If you want a good example of the results not “working” with our presets, check out Cambridge, where we would’ve expected Cambridge, Massachusetts to have more influence.
We conceived of this project thinking about how people talk about places, so they are grouped together if they share either the same spelling or pronunciation (using phoneme matching).
We only used international cities with populations over 100,000 since it is unlikely that smaller ones are well-known to most people in the US.
Although places can be in multiple counties, we are only displaying the first one in the tables for brevity.
We used an amalgamation of data sources:
We are constantly updating the data, so if you spot something wrong with the places being used (or not), help us improve it!
Get in touch at russell@pudding.cool, and consider supporting The Pudding on Patreon!