Attempt #1: Zero-Shot

We used a process that we’re calling The Zero-Shot. We gave a description of the image and some context of our goal to a computer model to generate the results.

#757: Here is the cartoon for May 24, 2021 by Carolita Johnson
captionless cartoon by Carolita Johnson

Our submission: “I’m sorry about this whole climate change thing.”

The weather was awful and he was so boring.

7% of readers thought it was funny.

“I’m sorry about this whole climate change thing.”

25% of readers thought it was funny.

“The best case scenario is that those are just rainclouds. If not, I'll have to figure out how to keep my margarita from spilling.”

12% of readers thought it was funny.

About this approach

To produce the captions this week we “prompted” the GPT-3 model, davinci-instruct-beta, through the OpenAI API Playground. This model is specially trained to “follow instructions,” so we gave it the instruction “Write a funny caption for a New Yorker cartoon” plus a description we wrote of the cartoon illustrated by Carolita Johnson. The model parameters were set to their defaults.

In machine learning zero-shot learning refers to a paradigm where a model solely relies on its general knowledge of the world to solve a problem at test time. Say we play you a song by Olivia Rodrigo but you’ve never heard her music before. Can you guess that she’s the singer? What if you’ve heard a lot of Taylor Swift songs and have read a review of “drivers license” that talks about Olivia Rodrigo and her Swift-ian inspirations?

GPT-3 has never seen this particular description of a New Yorker cartoon. But it has read a lot of text on the internet that has described New Yorker cartoons and their captions. It’s also read a lot of New Yorker articles and has some understanding of the magazine’s readership and what they find interesting or funny. It has generally seen a lot of comics with associated text so presumably understands what is meant by a “cartoon caption.” Here is the full prompt:

Write a funny caption for a New Yorker cartoon.
Cartoon description: A man and a woman are having a picnic in a park. The man is talking. There are anthropomorphic clouds that are angry.
Cartoon caption:

That’s it! And GPT-3 fills in the blank.

