Skip to main content

Data & Methods

An interactive explainer for the choices in
A Map of Places with the Same Name

Introduction

Explanation

We chose to blend three properties to match our perception of how this plays out in reality:

Distance: the proximity to a place.

Population: the size of a place is a pretty good indicator of its importance.

Wikipedia article length: an extra layer of nuance for the cultural importance/awareness of places that may not have a large population.

Value

Value
metric
scale relative to counties?

Distance (weight: 2)

Population (weight: 1)

Wiki Length (weight: 1)

Explanation

There are a few different metrics we computed to see what matched our intuition the best:

score: the raw numerical score as calculated according to the three properties.

share: the score as a percent of the sum of all place scores for each county.

shareDelta: how much bigger the highest score is than the next highest.

Score on its own looks at things in isolation so that felt too siloed. When then tried share because it accounted for the effects of adding more places to the mix and giving each a bit of gravity. We ended up using shareDelta because of the small nuance it added. There are often times when two or more place shares are quite high, but since we only pick the top one the others are drowned out. By looking at how much higher the top one is, it better reflects the likelihood of one place over another.

Scale relative to counties basically means to look at that value compared to all results or independently. Say for example The maximum share for any county is 40. We would then normalize all county values to that. So a share of 20 would actually mean a value of 50%.

We chose to weight distance more heavily because we wanted to make it so if you are in or near the place you are referring to, it should always win, regardless of how big an external place is. We chose a relative comparison because really high place counts (like Washington) dilute the absolute results.

Opacity Threshold

Opacity threshold

Score >= 0.2 is 1

Score >= 0.05 is 0.75

Explanation

Based on the resulting value from above, which ones get the full opacity (probably) and which get reduced opacity (maybe). The right values here are the most driven by feel, and vary greatly depending on which metrics you select above and if the scores are relative or not.

Distance Scale

Distance Scale (mi)
scale type

Exponent

Min

Max

Explanation

We chose a logarithmic scale because it felt most accurate: if you are really close to a place its gravity should be felt more heavily than if you get farther away. Anything under the minimum (we chose 50 miles) gets the highest score (1), anything over the maximum (300 miles) gets 0.

Population Scale

Population Scale (people)
scale type

Exponent

Min

Max

Explanation

Like distance, we chose to clamp the maximum value here. At a certain size, there is really little distinction. We chose to use a computed threshold as our max clamp of the top 0.1% percentile. We also chose an exponential scale to curb the effects of the lopsided distribution, so the medium-sized places still have some weight and aren’t lumped with the places with very low numbers.

Wikipedia Article Scale

Wiki Scale (article length)
scale type

Exponent

Min

Max

Explanation

Like distance, we chose to clamp the maximum value here. At a certain size, there is really little distinction. We chose to use a computed threshold as our max clamp of the top 0.1% percentile. We also chose an exponential scale to curb the effects of the lopsided distribution, so the medium-sized places still have some weight and aren’t lumped with the places with very low numbers.

Miscellaneous Details

Explanation

If you want a good example of the results not “working” with our presets, check out Cambridge, where we would’ve expected Cambridge, Massachusetts to have more influence.

We conceived of this project thinking about how people talk about places, so they are grouped together if they share either the same spelling or pronunciation (using phoneme matching).

We only used international cities with populations over 100,000 since it is unlikely that smaller ones are well-known to most people in the US.

Although places can be in multiple counties, we are only displaying the first one in the tables for brevity.

We used an amalgamation of data sources:

Incorporated US cities

Unincorporated US cities

Countries

International cities

States

We are constantly updating the data, so if you spot something wrong with the places being used (or not), help us improve it!

Get in touch at russell@pudding.cool, and consider supporting The Pudding on Patreon!