Attempt #1: Zero-Shot
We used a process that we’re calling The Zero-Shot. We gave a description of the image and some context of our goal to a computer model to generate the results.
Back to ProjectOur submission: “I’m sorry about this whole climate change thing.”
Check out all three generated captions.
The weather was awful and he was so boring.
7% of readers thought it was funny.
“I’m sorry about this whole climate change thing.”
25% of readers thought it was funny.
“The best case scenario is that those are just rainclouds. If not, I'll have to figure out how to keep my margarita from spilling.”
12% of readers thought it was funny.
About this approach
To produce the captions this week we “prompted” the GPT-3 model, davinci-instruct-beta, through the OpenAI API Playground. This model is specially trained to “follow instructions,” so we gave it the instruction “Write a funny caption for a New Yorker cartoon” plus a description we wrote of the cartoon illustrated by Carolita Johnson. The model parameters were set to their defaults.
In machine learning zero-shot learning refers to a paradigm where a model solely relies on its general knowledge of the world to solve a problem at test time. Say we play you a song by Olivia Rodrigo but you’ve never heard her music before. Can you guess that she’s the singer? What if you’ve heard a lot of Taylor Swift songs and have read a review of “drivers license” that talks about Olivia Rodrigo and her Swift-ian inspirations?
GPT-3 has never seen this particular description of a New Yorker cartoon. But it has read a lot of text on the internet that has described New Yorker cartoons and their captions. It’s also read a lot of New Yorker articles and has some understanding of the magazine’s readership and what they find interesting or funny. It has generally seen a lot of comics with associated text so presumably understands what is meant by a “cartoon caption.” Here is the full prompt:
Write a funny caption for a New Yorker cartoon.
"""
Cartoon description: A man and a woman are having a picnic in a park. The man is talking. There are anthropomorphic clouds that are angry.
Cartoon caption:
That’s it! And GPT-3 fills in the blank.