Skip to main content

CAN DATA DIE?

Why One of the Internet’s Oldest Images Lives On Without Its Subject’s Consent

In 2021, sharing content is easier than ever. Our lingua franca is visual: memes, infographics, TikToks. Our references cross borders and platforms, shared and remixed a hundred different ways in minutes. Digital culture is collective by default and has us 😂, 😭, and 😍 together all around the world.

But as the internet reaches its “dirty 30s,” what happens when pieces of digital culture that have been saved, screenshotted, and reposted for years need to retire? Let’s dig into the story of one of these artifacts: The Lenna image.

The Lenna image may be relatively unknown in pop culture today, but in the engineering world, it remains an icon. I first encountered the image in an undergrad class, then grad school, and then all over the sites and software I use every day as a tech worker like Github, OpenCV, Stack Overflow, and Quora.

This is Lenna.

To understand where the image is today, you have to understand how it got here. So, I decided to scrape Google scholar, search, and reverse image search results to track down thousands of instances of the image across the internet (see more in the methods section).

playboy

Lena Forsén, the real human behind the Lenna image, was first published in Playboy in 1972. Soon after, USC engineers searching for a suitable test image for their image processing research sought inspiration from the magazine. They deemed Lenna the right fit and scanned the image into digital, RGB existence.

From here, the story of the image follows the story of the internet. Lenna was one of the first inhabitants of ARPANet, the internet’s predecessor, and then the world wide web.

journal

While the image’s reach was limited to a few research papers in the ’70s and ’80s, in 1991, Lenna was featured on the cover of an engineering journal alongside another popular test image, Peppers.

This caught the attention of Playboy, which threatened a copyright infringement lawsuit. Engineers who had grown attached to Lenna fought back. Ultimately, they prevailed, and as a Playboy VP reflected on the drama: "We decided we should exploit this because it is a phenomenon."

The Playboy controversy canonized Lenna in engineering folklore and prompted an explosion of conversation about the image. Image hits on the internet rose to a peak number in 1995.

silicon-valley

In the 21st century, the image has remained a common sight in classrooms and on TV, including a feature on Silicon Valley in 2014.

Pushback towards the use of the image also grew in the 2010s leading up to 2019, when the Losing Lena documentary was released.

losing-lena

Forsén shares her side of the story and asks for her image to be retired: “I retired from modelling a long time ago. It’s time I retired from tech, too. We can make a simple change today that creates a lasting change for tomorrow. Let’s commit to losing me.”

After the film’s release, many of my female colleagues shared stories about their own encounters with the image throughout their careers. When one of the only women this well referenced, respected, and remembered in your field is known for a nude photo that was taken of her and is now used without her consent, it inevitably shapes the perception of the position of women in tech and the value of our contributions.

The film called on the engineering community to stop their spread of the image and use alternatives instead. This led to efforts to remove the image from textbooks and production code and a slow, but noticeable decline in the image’s use for research.

But despite this progress, almost 2 years later, the use of Lenna continues. The image appears on the internet in 30+ different languages in the last decade, including 10+ languages in 2021.

The image’s spread across digital geographies has mirrored this geographical growth, moving from mostly .org domains before 1990 to over 100 different domains today, notably .com and .edu, along with others.

Within the .edu world, the Lenna image continues to appear in homework questions, class slides and to be hosted on educational and research sites, ensuring that it is passed down to new generations of engineers. Whether it's due to institutional negligence or defiance, it seems that for now, the image is here to stay.

Having known Lenna for almost a decade, I have struggled to understand what the story of the image means for what tech culture is and what it is becoming.

To me, the crux of the Lenna story is how little power we have over our data and how it is used and abused. This threat seems disproportionately higher for women who are often overrepresented in internet content, but underrepresented in internet company leadership and decision making. Given this reality, engineering and product decisions will continue to consciously (and unconsciously) exclude our needs and concerns.

While social norms are changing towards non-consensual data collection and data exploitation, digital norms seem to be moving in the opposite direction. Advancements in machine learning algorithms and data storage capabilities are only making data misuse easier. Whether the outcome is revenge porn or targeted ads, surveillance or discriminatory AI, if we want a world where our data can retire when it's outlived its time, or when it's directly harming our lives, we must create the tools and policies that empower data subjects to have a say in what happens to their data… including allowing their data to die.

Further Reading

On the Lenna image

On data exploitation

On data rights

Methods & Notes

To build our dataset of instances of Lenna on the internet, I used three sources: Google scholar, search, and reverse image search, which respectively returned 5300, 1100, and 720 results. For the two text-based search engines (Google scholar and search), I used combinations of keywords, variations on "lenna+jpg"+OR+"lenna+jpeg" or “lenna+computer+vision” to scrape hits that matched those queries. For Google scholar, I used different time intervals from 1970-2021, usually in 5 year chunks, to pull data. For Google search, I performed a manual check on the results, to filter out irrelevant pages from the dataset. In order to identify variations of the Lenna image, I used four different base images for the Google Reverse image search: the standard image, wider cropped image, grayscale image, and a flipped grayscale image. In all cases I only pulled data up to the first 100 pages of results. Data was pulled for dates through September 2021.

Next, I aggregated the three resulting datasets into a combined dataset and extracted the title, dates, and url/domain information from each hit (e.g. domain endings like .com and .edu). After the data was cleaned and processed, it was loaded into Google Sheets to identify the language of the title and description using the DETECTLANGUAGE function, which draws from Google Translate.

The data and code can be found in this repo. To make sure we weren't contributing to the spread of the Lenna image, the full resolution image is not hosted on this site. When it appears in the article, it's either a pixelated or a blurred version.