Beyond Regurgitation: Transformative Fair Use In The Training Of AI

When the New York Times sued OpenAI, the allegation grabbing the most headlines concerned ChatGPT sometimes producing, as output to the user, all or substantially all of an actual article previously published by the Times.
United States Technology
To print this article, all you need is to be registered or login on Mondaq.com.

AI and Fair Use

When the New York Times sued OpenAI, the allegation grabbing the most headlines concerned ChatGPT sometimes producing, as output to the user, all or substantially all of an actual article previously published by the Times. It has been widely reported that OpenAI accuses the Times of deliberately instigating GPT to "regurgitate" Times articles by "manipulating" prompts in bad faith, if not actually directing GPT to infringe the Times' own copyrights.

The wholesale unauthorized copying and distribution of copyrighted literary works to your own paying users is obviously an infringing use, every time. This is as true in 2024 with AI as it was nearly 200 years earlier, when the accomplished U.S. Representative from Massachusetts, the Reverend Charles Wentworth Upton, included 353 pages of previously published letters written by George Washington in the Reverend's own nearly 900-page, two-volume work, leading to him lose what is widely considered the first "fair use" case in the U.S. Supreme Court.

In terms of the merits and importance of the Times case (and others of its ilk, including those involving photographs and other visual art) this narrow focus on AI "regurgitation" of complete copies of copyrighted works is a red herring distracting us from more nuanced and important issues like fair use of copyrighted material as inputs to train AI models and whether expressive works made by or with AI can be copyrighted in the first instance.

An Appetite for Instruction

AI training and testing models have appetites for content as voracious as a Westerosi dragon landing in a pasture full of sheep after a big battle. Even when the purveyors of the model are able to strike deals like OpenAI has done with the likes of Reddit and Time magazine to license copyrighted material as training inputs (discussed in detail in Part I of this post), the sheer volume of data needed to train the models makes it prohibitive to do licensing deals with all the thousands if not millions of copyright holders whose online content might be used to train AI, likely without their knowledge or consent.

Set forth in Section 107 of the Copyright Act, fair use is a doctrine under U.S. copyright law that allows for limited use of copyrighted material without permission under certain conditions. The factors determining fair use are: (1) the purpose and character of the use, including whether it is of a commercial nature or for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.

Transformative Use of Creative Works

In Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, decided by the Supreme Court in 2023, well-known photographer Lynn Goldsmith sued the Andy Warhol Foundation for copyright infringement. Warhol had produced a series of silk-screened artworks based on Goldsmith's copyrighted photo of Prince, which featured changes in composition and color palette. One of these works, "Orange Prince," ended up on the cover of Vanity Fair without any attribution or compensation for Goldsmith.

Only the first factor was in front of the Supreme Court, which ruled for Goldsmith, holding that "Orange Prince" was a commercial use of the copyrighted work with a "purpose and character" insufficiently "transformed" from that of the original to constitute fair use.

"Transformative" use is not mentioned anywhere in the definition of fair use; nevertheless, it has become the touchstone of contemporary fair use jurisprudence—and is likely to be at the center of the eventual court opinions coming out of fair use cases centered on AI.

The Court deemed "Orange Prince" not transformative enough to be fair use because Warhol basically created a "commercial substitute" comprised of substantial portions of the original photo, for use in the same market as the original for essentially the same purpose. By contrast, the Court said, works like 2 Live Crew's version of Roy Orbison's "Pretty Woman" (the subject of a 1994 fair use case) and Warhol's own iconic soup can paintings were transformative enough to constitute fair use. In the case of the former, because the rap group's cover of the song was enough of a parody to create a new work based on critiquing or commenting on the original, and in the case of the latter, because the paintings serve "a completely different purpose, to comment on consumerism rather than to advertise soup."

Transformative Use Based on Non-Expressive Aspects

The Warhol decision comes on the heels of the Supreme Court's foray into fair use two years earlier in Google LLC v. Oracle America, Inc., which addressed "non-expressive use" rather than the "creative" or "expressive" use at issue in Warhol. Oracle sued Google for infringement based on Google's use of over 11,000 lines of Oracle's Java API code to develop the Android operating system. The Supreme Court held that by creating a new platform for mobile devices, Google had sufficiently repurposed the Java API, which was originally designed for desktop applications, to constitute a transformational fair use.

Perhaps the most revealing of the higher profile cases about transformative fair use, in terms of what it says about the future legality of AI-driven products like ChatGPT, is another, earlier Google case decided in the Second Circuit in 1997. In Authors Guild v. Google, the court found Google's digitization of millions of books to create a searchable database to be a transformative fair use in that it offered access to capabilities such as data mining and comprehensive text searches, which were fundamentally different from simply reading a book and which arguably could expand the market for the books in the database.

How to Train Your AI Dragon

When training AI, the use of copyrighted works to develop new functionalities or applications may be deemed transformative if the AI creates something significantly new and different. However, if the AI merely replicates or makes minor modifications to the original works without adding substantial new meaning or value, it is unlikely to be considered transformative.

Courts are generally more reluctant to find fair use when the original works are highly creative. For fair use to apply, the resulting AI product must utilize the original creative works in a way that is distinctly different, adds new value or utility, and does not adversely affect the market for the original works.

Beyond Regurgitation: Transformative Fair Use In The Training Of AI

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More