Encyclopedia Britannica and Merriam-Webster filed a copyright lawsuit against OpenAI on March 16, alleging that ChatGPT was trained on nearly 100,000 copyrighted articles without permission. The case is the 91st copyright lawsuit filed against an AI company in the United States and adds another front to the legal battle over how generative AI models are built.

What Happened

Britannica, which owns Merriam-Webster, filed the complaint in Manhattan federal court. The lawsuit makes three core allegations: that OpenAI scraped nearly 100,000 online articles to train its language models, that ChatGPT generates outputs containing "full or partial verbatim reproductions" of Britannica's content, and that OpenAI's retrieval-augmented generation (RAG) workflow directly uses Britannica articles to generate answers.

Britannica also alleges trademark violations under the Lanham Act, arguing that ChatGPT sometimes generates hallucinated information and falsely attributes it to Britannica, damaging the publisher's reputation for accuracy.

Why It Matters for Creators

This lawsuit targets the same legal question that affects every generative AI tool creators use: whether training on copyrighted content constitutes fair use. The outcome will not just affect text models. Image generators like Midjourney and Stable Diffusion, video models like Sora and Runway, and music generators like Suno all face the same fundamental question about their training data.

The RAG allegation introduces a new dimension. If courts rule that using copyrighted content in retrieval-augmented generation is infringement, it could force AI companies to license every source they reference, not just the data they trained on. That would raise costs across the entire AI stack.

For context, the Anthropic copyright case (Bartz v. Anthropic) reached a $1.5 billion class action settlement in 2025, the largest in the AI copyright litigation wave. The existing OpenAI multidistrict litigation consolidating over a dozen publisher lawsuits is approaching the close of fact discovery, with no fair use ruling expected before summer 2026.

Key Details

  • Plaintiffs: Encyclopedia Britannica and Merriam-Webster (same parent company)
  • Defendant: OpenAI
  • Court: Manhattan federal court (SDNY)
  • Claims: Copyright infringement (training data + RAG), trademark violation (false attribution of hallucinations)
  • Scale: Nearly 100,000 online articles allegedly scraped
  • Lawsuit count: 91st copyright lawsuit against an AI company in the US
  • Related case: Same plaintiffs sued Perplexity in September 2025 on similar grounds

What to Do Next

Creators who rely on AI-generated content for commercial work should monitor how the training data lawsuits resolve. If fair use defenses fail broadly, expect higher prices for AI tools as companies negotiate licensing deals or retrain on licensed datasets. Platforms like Adobe Firefly, which already train exclusively on licensed content, may gain a competitive advantage.

The indie artist lawsuit against Google over Lyria 3 raises similar questions for music generation. Together, these cases are building the legal framework that will determine how every creative AI model is built and priced going forward.