United States

The Generative AI Copyright Disclosure Act: Congress Grapples With Transparency

Finnegan, Henderson, Farabow, Garrett & Dunner, LLP

Finnegan, Henderson, Farabow, Garrett & Dunner, LLP is a law firm dedicated to advancing ideas, discoveries, and innovations that drive businesses around the world. From offices in the United States, Europe, and Asia, Finnegan works with leading innovators to protect, advocate, and leverage their most important intellectual property (IP) assets.

Explore

Generative AI has captured the attention of the world. As the public has embraced this revolutionary technology, some creative communities have raised concerns about their copyrighted works...

United States Technology

Generative AI has captured the attention of the world. As the public has embraced this revolutionary technology, some creative communities have raised concerns about their copyrighted works being ingested or fed into AI models for training purposes.

Among other things, content creators have expressed a need for transparency in regard to AI model training, with copyright owners being able to determine whether their works were, in fact, used to train AI models.

These concerns have not gone unheard by the US Congress. In fact, both the US Senate and US House of Representatives have taken notice.

The Route to Legislation

In the Senate, the Bipartisan Senate AI Working Group—led by US Senate Leader Chuck Schumer and Senators Mike Rounds, Martin Heinrich, and Todd Young—has conducted a series of nine closed-door forums to build consensus on how AI legislation should be crafted, and recently issued a 'roadmap' for AI policy development in the Senate.

Among other recommendations, the roadmap encourages the relevant congressional committees to "[c]onsider federal policy issues related to the datasets used by AI developers to train their models, including datasets that might contain sensitive personal data or are protected by copyright, and evaluate whether there is a need for transparency requirements."

Members of the US House of Representatives have similarly addressed the need for transparency requirements for training datasets by introducing various proposed legislative solutions.

For example, in December 2023, Representatives Anna Eshoo and Don Beyer introduced the AI Foundation Model Transparency Act (H.R. 6881), which would direct the Federal Trade Commission (FTC) to work with the National Institute of Standards and Technology (NIST), the Director of the Office of Science and Technology Policy, and the Register of Copyrights, to establish rules for reporting training data transparency.

Specifically, the FTC would promulgate regulations identifying the information needed to "improve the transparency of foundation models ... with respect to training data", and issue guidance to covered AI models needing to comply with the FTC's regulations.

The FTC would also identify which information the covered AI models would require (and which could be made available on its website), and which information the covered AI models would make publicly available on their own websites.

Support from the Creative Community

Taking a slightly different approach to training data transparency, in April 2024, Representative Adam Schiff introduced the Generative AI Copyright Disclosure Act (H.R. 7913).

This Act would require anyone who creates or materially alters a training dataset for a generative AI system to submit a notice with the US Copyright Office containing a URL for the dataset and "a sufficiently detailed summary of any copyrighted works used" to create or materially alter the dataset.

A notice would need to be filed with the Copyright Office no later than 30 days before the generative AI system for which the training dataset would be used would be made available to consumers; or, for an already-available generative AI system, no later than 30 days after the effective date of the Act.

In addition, the Copyright Office would be required to issue regulations regarding civil penalties for non-compliance (in an amount not less than $5,000), and establish and maintain a publicly available online database containing each notice filed with the Office.

The Act has garnered significant support from the creative community, and music industry groups in particular, such as the Recording Industry Association of America, Copyright Clearance Center, Directors Guild of America, Authors Guild, Screen Actors Guild-American Federation of Television and Radio Artists, and Black Music Action Coalition.

Wide-ranging Ramifications

If implemented, the Act could have implications for creators, developers of AI models, and the Copyright Office alike.

For copyright owners, the bill would address their concerns regarding transparency and AI model training. Specifically, it seems to respond to their concerns that AI models have been using their copyrighted works without permission for training purposes, and without their knowledge.

The bill would not prohibit the use of copyrighted works for AI model training, nor require AI models to compensate copyright owners for such use of their works.

But copyright owners may say it attempts to provide them with the means to determine whether their works have indeed been used for training purposes, and in which training datasets—and that having such information could benefit them in multiple ways.

As more copyright owners enter agreements with AI models to license their content for training purposes, for example Associated Press with ChatGPT, copyright owners may also say that having a public database to search and determine whether their works have been used in training datasets could give them leverage to seek licensing agreements with AI models for training purposes.

In addition, copyright owners may say the bill could give them the potential to enforce their rights against unauthorised use in the future, should US courts ever determine that ingestion of copyrighted works into AI models for training purposes constitutes copyright infringement (as opposed to fair use, as asserted by certain AI models).

For AI models, some may say that the burden of reporting (particularly, retroactively) the identity of the millions of copyrighted works used in AI model training in a single notice is too great.

Others may assert that submitting notices to the Copyright Office with "sufficiently detailed summaries" of which copyrighted works were used in training may lead to a public database containing inconsistent and difficult-to-search information. Indeed, data problems and the inability to identify copyrighted works and their owners correctly are not uncommon.

For example, the Music Modernization Act, enacted just in 2018, created a government-designated entity to administer a new blanket license for interactive streaming services given the previous and significant data problems surrounding the identification of musical works and who to pay for their use.

For the Copyright Office, the Generative AI Copyright Disclosure Act could present significant IT implications. If enacted, the Copyright Office would be required to establish and maintain a publicly available online database containing each notice filed with the office identifying copyrighted works used in training datasets.

Given the sheer volume of data used in training datasets, maintaining such a database could prove to be a massive IT undertaking—and the Copyright Office is already in the midst of multiple IT efforts.

For the last several years, the Copyright Office has been undertaking the Herculean task of modernising its IT systems, including its registration and recordation systems. While those plans are still being implemented, imposing an additional and significant IT burden on the Copyright Office may not be practical.

Overall, the Generative AI Copyright Disclosure Act is one solution introduced amid Congress's seeming desire to address the transparency of AI model training. As AI models continue to evolve and the potential for a licensing market takes shape, time will tell if a legislative solution is indeed needed.

Originally published by World Intellectual Property Review, 20 June 2024

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

The Generative AI Copyright Disclosure Act: Congress Grapples With Transparency

Contributor

The Route to Legislation

Support from the Creative Community

Wide-ranging Ramifications

Technology

Contributor