ARTICLE
17 April 2025

Copyright In The Age Of AI: Why Publicly Visible Content Isn't Free For The Taking

Traverse Legal

Contributor

In 2004, Traverse Legal was a start-up. We created a brand new business model for the law that is now used by some of the biggest law firms in the country. We invented and incorporated technology into our process and client relations, which are still innovative and unique. We have represented clients of all types in connection with technology, internet law, intellectual property, and business matters. We can help you.

As a niche law firm with controlled overhead and specialized practice areas, we can provide more cost-effective, knowledgeable, and strategic representation than the large law firms we go up against every day. Our clients are based in over 25 different countries around the globe. There is a reason why some of the largest and most successful companies in the world select Traverse Legal to handle matters within our areas of experience.

Generative AI depends on data—and lots of it. From search engines and large language models to image generators and music tools, these systems are trained on massive datasets, often pulled directly from the public web.
United States New York Technology

Generative AI depends on data—and lots of it. From search engines and large language models to image generators and music tools, these systems are trained on massive datasets, often pulled directly from the public web. Think news articles, social media posts, lyrics, books, stock images, etc.

But here's the catch: just because content is publicly viewable doesn't mean it's legally free to use.

That assumption—that anything online is fair game for scraping and training—has fueled the rapid expansion of AI innovation. Now, it's facing serious legal pushback. Copyright owners across industries are fighting back, and the courts are starting to weigh in. The most notable signal came in early 2025, when a federal judge in Thomson Reuters v. Ross Intelligence rejected a fair use defense in an AI training context. It marked a clear warning: copyright law still applies in the age of AI.

While these court cases will no doubt go on for years, the implications for companies using AI tools, companies building AI tools, and everyone in between are at risk right now. Upstream and downstream players must know the uncertainty and potential for getting sued or receiving a threat letter. The companies building AI models are at the highest risk of getting sued. As a practical matter, however, most of the development work in artificial intelligence involves building tools on top of existing AI models using an API. And yes, companies building AI tools connected to a model through the model's API can get sued for copyright infringement. While end users of AI tools are less likely to get sued, they also have legal liability under established copyright law.

In today's AI-driven economy, innovation is moving faster than the law, and that gap is closing quickly. As courts scrutinize how generative models are trained, a new legal frontier is shaping around copyright, fair use, and content ownership. This article unpacks the emerging risks of scraping publicly accessible content, highlights pivotal court rulings reshaping the rules, and offers practical insight into what AI companies, content creators, and investors must do now to adapt. The compliance window is narrowing, and those who act early will be best positioned to lead.

The Legal Battleground: Copyright Infringement vs. Fair Use

How AI Training Works—and Why It Raises Legal Flags

At the heart of this legal fight is how AI models "learn." To function, generative AI systems must process vast volumes of content. During training, models ingest and copy content—often word-for-word or pixel-for-pixel—to identify patterns, relationships, and structures.

That act of copying, even if not shown directly in the AI's final outputs, is where the copyright issues begin. For copyright holders, it's not just about what the model produces but how it was trained. Copying entire books, articles, or image libraries without a license raises a serious question: is this a transformative, fair use of content, or simply unlicensed reproduction for commercial gain?

As more plaintiffs allege infringement, the risk to AI companies is no longer theoretical. It's immediate, it's growing, and it's moving toward the courtroom.

Fair Use Isn't a Free Pass

Many AI companies lean on the doctrine of "fair use" as a shield. But that protection is more nuanced—and more fragile—than many assume.

Courts use a four-factor test to determine whether copying qualifies as fair use:

  1. Purpose and character of the use — Is it transformative? Commercial? Educational?
  2. Nature of the copyrighted work — Is the original factual or creative?
  3. Amount and substantiality used — How much was taken, and was it the "heart" of the work?
  4. Effect on the market — Does the use harm the market for the original?

Apply this framework to AI, and things get murky. Is training an LLM on entire novels "transformative"? Does using thousands of stock photos to teach an image model reduce licensing demand? The answers vary—and courts are just beginning to grapple with these questions.

What's clear is that "publicly available" is not synonymous with "public domain." Just because content is out in the open doesn't mean it's fair game for commercial AI training. And now, with high-stakes lawsuits gaining traction, that line is being tested—and redrawn—in real time.

A Legal Turning Point: Thomson Reuters v. Ross Intelligence

What Happened

In one of the first major rulings to directly address copyright and AI training, the court in Thomson Reuters v. Ross Intelligence delivered a decisive blow to the "fair use" argument many AI companies rely on. Ross, a legal tech startup, had used Westlaw's proprietary headnotes—concise, editor-authored legal summaries—to train its AI-based legal research assistant.

Thomson Reuters, the parent company of Westlaw, sued for copyright infringement. Ross argued that its use of the headnotes was transformative and should be protected under fair use. But the court disagreed. It found that Ross had directly copied the content, that the headnotes reflected significant editorial judgment, and that the copying served a commercial purpose. As a result, Ross's fair use defense failed—and the court held that the company had infringed Thomson Reuters' copyrights.

Why It Matters

This case represents a critical inflection point in the legal treatment of generative AI. It reinforced a key message: publicly visible content—even in a professional or technical context—is not automatically free for training use. The court was especially concerned with the originality and editorial value of the headnotes, as well as the competitive harm to Westlaw's market position.

In other words, if your AI model is trained on curated, human-authored content—particularly when that content reflects intellectual labor and is commercially licensed—you may be stepping into copyright infringement territory.

Industry Implications

The implications of this ruling stretch far beyond the legal tech world. For startups building AI products, and for investors backing them, the Ross case should trigger a strategic reassessment of how training data is sourced, documented, and licensed.

This ruling may become a persuasive precedent for other courts evaluating the same question across different industries—whether it's music, journalism, education, or entertainment. It signals a shift: the era of unchecked web scraping is giving way to an age of accountability.

Going forward, AI companies should assume that if the training data has commercial value and was created with human authorship, copyright protections likely apply—and ignoring them is no longer a viable business strategy.

Pending Lawsuits That Could Reshape AI Development

Across every major content category—news, books, images, music—copyright holders are filing lawsuits that could fundamentally alter how generative AI is built. These cases aren't just about past misuse. They're setting the legal blueprint for what's permissible going forward—and where the lines will be drawn.

News Content Under Fire

  • The New York Times v. OpenAI & Microsoft
    In one of the most high-profile lawsuits to date, The New York Times accuses OpenAI and Microsoft of using its articles—without permission—to train large language models like GPT. The case has cleared initial legal hurdles and is moving forward, signaling that courts are willing to scrutinize how foundational AI models were developed. The Times argues that its reporting is being regurgitated by the models, undermining subscriptions and licensing revenue.
  • Daily News & Other Publishers v. OpenAI & Microsoft
    A growing number of news organizations are following suit—literally. Outlets including the Chicago Tribune, Orlando Sentinel, and others allege their content was scraped and used without authorization. Their central claim: AI tools trained on their journalism devalue their original reporting and erode traffic and ad revenue.
  • Dow Jones & New York Post v. Perplexity AI
    Dow Jones, the publisher of The Wall Street Journal, and New York Post have sued Perplexity AI, a startup whose AI assistant synthesizes answers based on real-time web data. The plaintiffs argue that Perplexity is republishing news content scraped from their sites—without licensing or attribution—effectively competing with them using their own work.

Books and Literature

  • Authors Guild v. OpenAI
    Representing a broad coalition of writers, the Authors Guild has filed a consolidated class action against OpenAI. Named plaintiffs include authors like Michael Lewis, Nicholas Basbanes, and Michael Alter. The claim: their copyrighted books were included in GPT training datasets without consent, compensation, or any licensing. The suit raises key questions about the use of entire literary works in training and whether this qualifies as fair use under copyright law.

Image-Based AI Models

  • Getty Images v. Stability AI
    Getty Images has brought copyright claims against Stability AI, the company behind image generator Stable Diffusion, for allegedly scraping millions of copyrighted images from Getty's library. The lawsuit is playing out in both the U.S. and the UK, giving it international scope. Getty argues that Stability not only copied its images but used them to create a product that directly competes with its licensed photography business.

Music and Lyrics

  • Music Publishers v. Anthropic
    Music publishers, including Universal and Concord, have sued Anthropic, the developer behind Claude AI, for allegedly training on copyrighted song lyrics without permission. In their filings, the publishers demonstrate that Claude can reproduce copyrighted lyrics on command—suggesting that the model is retained and can replicate protected material.

International Dimensions

  • Asian News International (ANI) v. OpenAI
    In a sign that legal exposure is not limited to U.S. jurisdictions, Indian news agency ANI has filed a complaint alleging OpenAI's models reproduced its copyrighted content without authorization. This case introduces complex questions about cross-border copyright enforcement and may foreshadow similar suits in other countries where generative AI is expanding quickly but operating in legal gray areas.

Key Takeaways from the Legal Landscape

As courts begin to weigh in on how copyright law applies to AI, several critical insights are emerging. These takeaways are essential for AI developers, product leaders, content owners, and investors looking to navigate the evolving legal and business environment:

Publicly Visible ≠ Public Domain

Just because content is online doesn't mean it's free to use. Courts are drawing a clear distinction between access and ownership. Public visibility does not eliminate copyright protection, especially for content that reflects original, human-authored effort.

Companies relying on the assumption that "public" equals "permissible" are increasingly facing legal scrutiny—and losing that argument in court.

AI Developers Face Real Legal Risk

The use of unlicensed content in training datasets carries tangible legal exposure. From copyright infringement claims to potential injunctions or forced model retraining, the risks are no longer hypothetical.

AI companies—especially those whose products are built on large-scale scraping—need to evaluate the sustainability and legality of their data practices. IP compliance is becoming a material concern in investment, acquisition, and regulatory conversations.

Content Creators Are Reclaiming Control

Recent rulings, such as in Thomson Reuters v. Ross Intelligence, suggest a shifting balance of power toward content owners. Courts are recognizing the value of editorial judgment, creative effort, and the commercial markets for original works.

As lawsuits gain traction, content creators across industries—news, publishing, music, and beyond—are finding new leverage to enforce their rights. The future may favor frameworks that include licensing, attribution, and negotiated data access agreements.

Looking Ahead: Will AI Be Forced to License Content?

The AI industry is entering a new phase—one where copyright compliance and content rights are becoming part of the business model, not an afterthought.

A New Licensing Economy

As legal challenges mount, AI companies are likely to face increasing pressure to license the content they rely on. That means negotiating with publishers, authors, image libraries, and other rights holders. While this shift may increase development costs, it also opens the door to new commercial models—where content is used with permission, attribution, and compensation.

Strategic partnerships could replace adversarial lawsuits, but only if companies proactively rethink how they source and use data.

Innovation vs. Regulation

Legal uncertainty may create short-term friction, especially for smaller startups or open-source models. But in the long run, building with compliance in mind is the only sustainable path forward. Systems that respect intellectual property will be more defensible in court, more attractive to partners, and better positioned to scale.

AI developers should begin preparing for a future where rights management and data governance are core infrastructure—not optional add-ons.

Policy Pressure Is Growing

Legislators are starting to weigh in. From proposed regulations on data transparency to hearings on fair use and AI accountability, public policy is beginning to shape the rules of the road. At the same time, industry-led standards may emerge—defining best practices for how generative models interact with protected content.

The companies that thrive in this environment will be the ones that take the lead, not the ones waiting for court orders or regulatory mandates.

Strategic Advice: What Businesses and Creators Should Do Now

AI Developers and Tech Companies

For developers, founders, and tech companies building generative models, the first and most urgent step is to gain visibility into the training data that powers their systems. That means auditing not only what's inside the models today, but where that content came from and whether its use carries legal risk. Relying on scraped or aggregated datasets without understanding their origins is no longer a defensible position—and the courts are making that clear.

Looking ahead, companies should proactively explore licensing models for high-value content types such as journalism, books, images, and music—especially in areas where copyright holders are already taking legal action. In parallel, it's critical to stay on top of the fast-moving legal landscape. Court decisions over the next 12 to 24 months will likely reshape the boundaries of fair use and copyright infringement in the AI space, with significant implications for both risk and innovation.

Equally important is the legal support behind the technology. Partnering with law firms that understand not only intellectual property law but also the underlying AI systems is essential. At Traverse Legal, we work directly with forward-thinking companies to help them build AI tools that are both innovative and compliant—designing risk-aware strategies that won't unravel when the next lawsuit hits.

Publishers and Content Owners

On the other side of the equation, content creators and publishers need to approach this moment with both vigilance and intent. Generative AI tools are already ingesting and reproducing their work, often without permission or attribution. Understanding how content is being used—and how it might be embedded in commercial AI tools—is a necessary first step in asserting control. As legal frameworks continue to evolve, some perspectives suggest

that content owners are beginning to shape the conversation around copyright and generative AI, including how licensing models and enforcement mechanisms are being reconsidered in this context.

For some, enforcement will make sense. That may mean pursuing individual legal action, joining collective suits, or simply putting platforms on notice. But there's also a parallel opportunity: to help shape the next generation of content licensing. Creators don't need to sit on the sidelines while others profit from their work. They can and should evaluate licensing frameworks that both protect their intellectual property and support responsible AI development.

Ultimately, whether you're building the next breakthrough AI product or protecting the creative output that fuels it, your legal strategy must evolve as quickly as the technology itself. Navigating that tension—between innovation and protection—is where strategic counsel makes the difference. Traverse Legal stands at that intersection, helping both sides move forward with clarity and confidence.

Rewriting the AI Playbook

The era of assuming that publicly available content is free for the taking is coming to an end. Courts are beginning to draw firmer lines between visibility and ownership, particularly when it comes to how generative AI models are trained. The once-blurry boundaries around fair use, scraping, and derivative outputs are becoming sharper—and the legal risks harder to ignore.

This shift has real consequences. AI companies that built on unlicensed content must now rethink their data practices. Creators and publishers, once sidelined in the AI conversation, are reclaiming control over their work. And investors and product leaders are quickly learning that legal compliance isn't a roadblock to innovation—it's part of building something that lasts.

The rules are changing. The stakes are rising. And the businesses that succeed in this next chapter will be the ones that treat copyright, licensing, and content rights not as legal afterthoughts—but as strategic foundations.

reviewed by Kareh WangariKimani

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

See More Popular Content From

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More