Comparisons
as Predictable as
the Sunrise
An analysis of 200,000 similes from popular fiction.
By Russell Samora. Design & Illustration by Shelly Tan.




Similes are all around us. But, if you haven’t considered this figure of speech since grade school, here’s a refresher: similes compare a shared quality of two things, often using “like” or “as.”
I pulled every simile in the form “as ___ as ___” from tens of thousands of fiction books for the top 500 most common adjectives.
To put you in a writer’s mindset, fill in the blank of the simile below.


Above are real results from my extensive analysis of this specific form of simile. Once you start looking, you see them everywhere, from the classics like Jane Eyre to last year’s darling Heart the Lover.
Swipe to see examples.
I thought it would be a trivial exercise, but the more I poked around, the more questions I had.
Why “as ___ as ___”: English has lots of ways to make comparisons. “Eyes like daggers,” “razor-sharp wit.” Most of these figurative forms are difficult to extract from text at scale. “As ___ as ___” is the exception because of its rigid structure. This makes it the most reliable to parse programmatically while minimizing the need for human judgement. It’s also surprisingly common in all forms of fiction. So while it’s just a sample of figurative language, it provides a quantifiable glimpse into the topic.

Every Adjective has a Fingerprint
So that we’re on the same page, here is a structural diagram for this form of simile:
- Tenor → the thing being described
- Ground → the shared quality (adjective)
- Vehicle → the comparison (noun)
My mouth has gone as dry as sawdust.
First, a notable disclaimer. We’ll mostly just look at a simplified version of this form; grounds that are adjectives, and vehicles that are nouns (specifically mono/bi-grams, i.e., one or two words). This reduces the noise in favor of a clearer signal. For example:
“You looked as surprised as a senator who’s passed a lie detector test.”
These one-offs can be fun and evocative if you know the context, but aren’t helpful when looking for data trends.
In similes, every adjective has a distinct shape. If you look at the usage of the nouns that follow, you can see whether an adjective is dominated by a single cliché or has range. Here is the shape of dry from our pop quiz.
You’ll notice that the top 3—bone, desert, and dust—make up 43% of all usage, and there is a pretty quick drop-off after that with a long tail of rarer choices. Let’s zoom out to see the shape of more adjectives. Below, each tiny chart is an adjective, and its bars show the top 20 nouns it pairs with. I’ve included every adjective with at least 200 occurrences.
Most adjectives’ shape have a similar skewed distribution, with some key distinctions:
- Gentler slopes tell us there are no dominant idioms.
- Many have a couple go-to nouns, then a long tail.
- The clichés are obvious, marked by a single tall spike that overshadows the others.
While some writers reach for a novel comparison to make it their own, most just want one that’s reliable and accessible. Check out this 1964 textbook Examine Your English, which implants the patterns from the start.

Plenty of these idioms have endured the test of time, like “as busy as a bee” or “as fit as a fiddle,” though many have fallen out of fashion in recent decades, like “as drunk as a lord” or “as rich as Croesus.”

Specialists and Generalists
Now let’s flip it around. While most adjectives lean on a small set of go-to comparisons, the nouns do the opposite work: a handful get reused constantly, often to make different points. Some are wielded as comparisons for dozens of different adjectives (generalists), while others are uniquely tied to a single quality (specialists).
Nothing exemplifies the specialist better than the cucumber. “As cool as a cucumber” is the paragon specialist.

“Cucumber” (or as I learned that it sometimes was referred to as cowcumber) doesn’t even crack the top ten in usage for any other adjective. On the other hand, you have the noun “hell,” which is a top-10 noun for 17 different adjectives.
To quantify this, I looked at the diversity of adjectives for each noun using the Simpson index. Put plainly: if you pick two similes for a noun at random, what is the chance that the adjectives match? For example, if you were to randomly look at two sentences with “cool as a ___,” there is a 92% chance they are both “cucumber.”
(Simpson’s diversity index)
Most nouns are closer to being generalists, sometimes because they have some reliable pairings because of distinct qualities (e.g., glass is smooth, fragile, transparent) or because of conceptually similar adjectives (e.g., the sun is bright, brilliant, radiant).
But like most things, the interesting stuff lies at the extremes. Certain nouns are so tightly coupled with a single adjective that they’ve become idioms. While others are so versatile or overused they’ve become generic.
Let’s focus on four of the generalist nouns that each reveal something different about how writers use comparisons.

As as a Cat
Cat is a noun that is used to represent all sorts of different qualities. Writers use it to mean everything from graceful to weak.
While most animals are pigeon-holed—think stubborn as a mule—cats span a huge range of observed behavior. And cat is the most used noun for four different adjectives: nervous, active, agile, nimble.
| Animal | Unique adjectives | Top-5 appearances | #1 appearances |
|---|

As as Stone
Although stone has one dominant physical quality where it tops seven adjectives (hard/solid/impenetrable, etc.,), where a cat is defined by the range of things it does, stone is often defined by what it lacks.
These four buckets are mostly a bunch of different ways to say “nothing is happening here,” which makes stone a perfectly blank, and oft-used canvas for a simile.

As as a Child
In fiction, a child has become shorthand for two things; being defenseless and wholesome. The scale tips towards the defenseless side with the highest usage coming with “as helpless as a child.”
On the wholesome side, the top usage is “as innocent as a child.” But child is only the second most-used noun for both of these adjectives, trailing the less evolved form: baby.

As as Hell
Hell is one of the most versatile nouns in the dataset. However, most of them have nothing to do with hell itself.
“As hot as hell” makes sense, if you subscribe to the religious imagery. But “as cute as hell?” “As sexy as hell?” Similes usually work because the noun possesses the quality. Hell often functions more as an amplifier.
I was curious about when this phenomenon started. According to Google Books Ngram Viewer, the first use of “sexy as hell” that I could verify was from the 1948 novel Innocent Villa:
“She looks like an old harpy but thinks she’s sexy as hell.”

If we want to get pedantic, the first use with the exact structure (“as _ as _”) was from the 1954 novel The Refuge:
“‘Any woman with eyes like hers gets a reputation for being as sexy as hell, old boy.’”
This makes hell pretty unique in the dataset. Most nouns earn their place by embodying something, whereas hell just became another way to say “very.”

The Ironic Ones
Not all similes play it straight. Take the classic “as clear as mud.” It inverts the expectation by using a contrasting quality of the noun, usually for humor. The artistry deepens as the comparison gets more specific.
“He looks about as happy as a dad at a Taylor Swift concert, but at least he’s in control of his rage.”
(As a dad, this would be me.)
While these are a small subset of the dataset, a kind of plamigerent counterculture of the simile world, they often paint more vivid and memorable pictures. They tend to work best with positive-sentiment adjectives, setting up the noun as the punch line.

Data and Methods
For this project, we focused on the most recognizable form: classic "as ___ as ___" similes. Those give us a clean ground (e.g., “white”) paired with a clean vehicle (e.g., “snow”) which makes them easier to count, compare, and cluster.
I used a natural language library to scan grammatical patterns looking for the simile form, restricted to the top 500 most-used adjectives according to this word frequency corpus.
A lot of time was spent filtering out junk. Common idioms (“as soon as”), structural stuff, like ending in pronouns (“as tall as him”) or proper nouns (“as fast as John”) or possessives (“as nasty as their reputation”). Basically similes that weren’t figurative comparisons. A second pass was done via three LLMs (Gemini, OpenAI, Anthropic) to help detect false positives, and flag whether it was a figurative or literal simile. LLMs were also used to extract and normalize the tenor and vehicle.
A handful of genuine similes probably got caught in the net and filtered out, but I did a lot of spot-checking and refining the scripts to minimize these occurrences.
The most involved part of the pipeline was vehicle aggregation. It can be both an obvious and blurry line of what should get binned together. For example, “wolf, “the wolves,” and “a pack of wolves,” are the same figurative image to humans, but to a computer, those are four different items. Alternatively, is a “kitten” the same or different from a “cat?” (I chose different). There were many techniques I combined to solve this problem, including normalization, embedding similarity, synonym detection, containment checks, and LLM judging. Generally speaking I leaned towards being conservative so things weren’t falsely merged.
The dataset is not perfect, but a good sample of how writers compare things in English-language fiction. It is accurate in aggregate but imperfect at the sentence level.
Most ranking stats referenced in the text related to noun usage was based on filtering down to the ones with 100+ occurrences in the dataset.