Adobe Faces Class-Action Lawsuit Over Alleged Misuse of Authors' Works for AI Training

2025-12-18



Like nearly every other tech company, Adobe has made significant investments in artificial intelligence over the past few years. Since 2023, the software giant has launched several AI-driven services, most notably Firefly—an AI-powered creative content generation suite. However, this aggressive push into AI may now be backfiring, as a new lawsuit alleges the company used pirated books to train its artificial intelligence models.


A class-action suit filed by Oregon-based author Elizabeth Lyon claims that Adobe utilized numerous copyrighted books—including her own—to train its SlimLM project.


Adobe describes SlimLM as a family of compact language models optimized for document-related tasks on mobile devices. According to court documents, SlimLM was pre-trained on SlimPajama-627B, an open-source, deduplicated, multilingual dataset released by Cerebras in June 2023. Lyon, who has authored multiple nonfiction writing guides, asserts that some of her published works were included in the pre-training data sources used by Adobe.


First reported by Reuters, Lyon’s complaint argues that her writings were incorporated into a processed subset of training data forming the foundation of Adobe’s model: “The SlimPajama dataset was created by copying the RedPajama dataset—including the replication of Books3,” the filing states. “As a derivative copy of RedPajama, SlimPajama therefore contains the Books3 dataset, which includes copyrighted works belonging to the plaintiff and proposed class members.”


“Books3”—a massive collection of approximately 191,000 copyrighted books compiled for training generative AI systems—has become a recurring source of legal controversy within the tech industry. RedPajama itself has also been cited in multiple lawsuits. In September, a lawsuit against Apple alleged the company used copyrighted materials to train its Apple Intelligence models, referencing the same dataset and accusing the firm of reproducing protected works without permission, attribution, or compensation. A similar case filed last October against Salesforce claimed the company leveraged RedPajama for AI training purposes as well.


Unfortunately, such litigation has become increasingly common. AI systems are often trained on vast datasets, some of which are believed to contain pirated or unauthorized content. In September, Anthropic agreed to pay $15 million to settle claims with several authors who accused the company of using illegally obtained books to train its Claude chatbot. That case is seen as a landmark development in the ongoing legal debate over copyright compliance in AI training data.