The Price of Machine Learning: How a $1.5 Billion Settlement Redefines AI's Legal Landscape
How book, music, image and news lawsuits are quietly turning AI’s data hunger into a priced—and licensed—commodity
The Price of Machine Learning: How a $1.5 Billion Settlement Redefines AI’s Legal Landscape
In August 2025, Anthropic did something unprecedented. The AI company agreed to pay $1.5 billion, with preliminary court approval following in September, to settle copyright claims from authors whose books trained its Claude chatbot. At roughly $3,000 per book for an estimated 500,000 works, the settlement represents the first major resolution in the escalating legal battle over whether AI companies can use copyrighted material freely for training. The deal may reshape the economics of AI development, suggesting that at least some large AI companies can compensate creators at scale without immediately halting innovation. But this settlement represents just one resolution in a sprawling legal conflict where different courts have reached contradictory conclusions about identical questions. [1]
Critically, the settlement releases Anthropic only from liability for specified past conduct through August 2025 and does not establish a licensing scheme for future AI training or, cover claims based on AI outputs, or affect Anthropic’s ability to train on lawfully acquired materials.
When Judges Disagree: The ‘Fair Use’ Puzzle
The Anthropic settlement emerged from Bartz v. Anthropic, where Judge Alsup had previously ruled in June 2025 that training AI on copyrighted books constitutes fair use—characterising the practice as “quintessentially transformative” and even “spectacularly so.” Just two days later, Judge Chhabria in Kadrey v. Meta reached similar conclusions about Meta’s Llama model, finding the training “highly transformative” despite Meta having sourced books from unauthorised pirate repositories. [2] The California judges emphasised that AI training served a different purpose from the original works, focusing on building language‑understanding systems rather than distributing books to readers.
Yet barely four months earlier, on 11 February 2025, U.S. Circuit Judge Stephanos Bibas—sitting by designation in the U.S. District Court for the District of Delaware—ruled the exact opposite in Thomson Reuters v. ROSS Intelligence. [3] He held that ROSS’s AI-powered legal research tool did not qualify for fair use because it served the same commercial purpose as Thomson Reuters’ competing product. Where copying serves to build direct market competitors, Judge Bibas reasoned, it looks like appropriation rather than transformation.
This divergence creates difficult uncertainty for anyone in the AI ecosystem. Closely analogous conduct—training models on copyrighted works—has been treated as fair use in some Northern District of California cases yet rejected on a different fact pattern in the District of Delaware, where the AI system competed directly with the rightsholder’s product. Companies attempting legal compliance face contradictory signals depending on jurisdiction, evidentiary record and the perceived proximity of the AI system to the rightsholder’s market. Appeals are likely to shape the contours of a workable standard, but until then, the risks remain highly fact‑dependent.
The divergence suggests that courts currently take different views of what AI systems do and how closely they compete with the markets for the underlying works. Judges favouring AI companies emphasise generative capabilities—the systems’ ability to produce novel outputs and assist with unprecedented tasks. Judges ruling against AI companies focus on competitive harm and market substitution—when AI serves identical functions as training works then copying appears more like theft than innovation. Both perspectives cite established precedent, both claim fidelity to copyright principles, and both present logical reasoning.
Music’s Unique Challenge: When AI Learns to Sing
While text-based AI battles dominate headlines, music presents distinct challenges because audio carries different cultural and economic significance. Three major record labels—Universal Music Group, Sony Music, and Warner Music Group—filed lawsuits in June 2024 through the Record Industry Association of America (‘RIAA’) against AI music generators Suno and Udio. [4] The allegations are that these services trained on copyrighted sound recordings to create systems generating remarkably similar music.
The labels demonstrated that careful prompting could coax these systems into producing tracks resembling iconic songs. Suno allegedly generated music containing recognisable elements from Chuck Berry’s “Johnny B. Goode,” B.B. King’s “The Thrill Is Gone,” and James Brown’s “I Got You (I Feel Good).” Udio reportedly created outputs with striking similarities to Michael Jackson’s “Billie Jean,” ABBA’s “Dancing Queen,” and Mariah Carey’s “All I Want For Christmas Is You.” The companies could not have built models producing such similar audio without initially copying those recordings, the labels argue.
The major labels sued Suno and Udio, alleging that both companies trained on vast catalogues of copyrighted recordings without authorisation and that the services can generate outputs that emulate protected expression. [4] Both companies have broadly maintained that training is lawful, while disputing the labels’ characterisation of the technology as a substitute for recorded music. The cases have become focal points for whether courts will apply different fair‑use or analogous standards to music than to text or images, given music’s highly recognisable and replayable nature.
Music publishers separately pursued Anthropic in Concord Music Group v. Anthropic, which filed in October 2023. [5] Eight publishers including Universal Music and ABKCO alleged that Anthropic’s Claude could reproduce copyrighted song lyrics when prompted—sometimes entire verses verbatim. Unlike generative claims about training, these allegations focused on output: Claude allegedly generated content that directly reproduced protected works. The case continues, with litigation ongoing regarding both training practices and Claude’s ability to reproduce lyrics.
The litigation landscape transformed dramatically in late 2025. Universal Music Group and Udio announced a settlement and strategic agreement on October 29, 2025, followed by Warner Music Group’s settlement with Udio on November 19, 2025 and its settlement and partnership with Suno on November 25, 2025. [4] Each arrangement signals a pivot from pure litigation toward licensing‑style frameworks. The companies have indicated that new services and models are expected to launch in 2026 trained on licensed and authorised content, with current disputed models to be retired or materially re‑worked. The financial terms have largely not been publicly disclosed.
These settlements do not resolve all claims or all claimants. Sony Music’s actions against Suno remain active, and other disputes continue, including class actions brought by independent artists in Illinois in October 2025 alleging “stream‑ripping” and other unlicensed acquisition of copyrighted recordings. [4] In parallel, European collecting societies have also pursued enforcement strategies. GEMA filed proceedings against Suno in the Munich Regional Court in January 2025 and, in November 2025, obtained its landmark ruling against OpenAI concerning the reproduction of song lyrics by ChatGPT. [6] The combined effect is a bifurcated landscape—major labels increasingly negotiating structured deals, while independents and collecting societies continue to litigate. For AI music developers, these developments signal that in key markets, courts and major rightsholders increasingly expect training on licensed or otherwise authorised catalogues.
Music cases highlight how different creative industries face distinct challenges. Text can be paraphrased and summarised; musical elements like melody, rhythm, and distinctive vocal styling are harder to transform beyond recognition. When AI produces audio that sounds remarkably similar to copyrighted recordings, the copying becomes aurally obvious in ways that text transformation might obscure. These cases will test whether courts apply different fair use standards for different creative media.
Images and the Compressed Copy Theory
Visual artists were among the first to challenge AI training in court. [7] Sarah Andersen, Kelly McKernan, and Karla Ortiz filed Andersen v. Stability AI in January 2023, targeting image generation systems trained on billions of scraped images. Their initial complaint faced scepticism from Judge William Orrick, who dismissed significant portions in October 2023, questioning whether they could prove infringement when AI outputs don’t reproduce specific training images exactly.
In August 2024, the artists refined their theory, arguing that model parameters embody ‘compressed copies’ of training images. This ‘compressed‑copy’ theory, now proceeding past an early motion to dismiss, could pose a fundamental challenge to prevailing technical assumptions if ultimately accepted by courts.
The compressed copy theory may represent a fundamental challenge to AI companies’ technical self-understanding. Engineers describe training as statistical pattern extraction without storing retrievable copies—more like learning correlations than recording data. But if courts accept that model parameters themselves constitute copies, the entire foundation of modern machine learning faces legal jeopardy. The theory’s implications extend beyond images: if image models contain compressed copies, what about language models trained on text?
Meanwhile, Getty Images pursued a parallel strategy, suing Stability AI in both US and English courts in early 2023. [8] The English case advanced more rapidly, reaching trial in June 2025 and judgment on November 4, 2025. Getty abandoned its primary English copyright infringement claims mid‑trial, focusing instead on secondary infringement under English law. The English High Court rejected Getty’s secondary infringement theory, holding on the evidence before it that diffusion‑model weights are not copies of training images and do not store infringing copies, while noting that different models that actually retain works could be treated differently. “The model weights are not themselves an infringing copy and they do not store an infringing copy,” Mrs. Justice Smith found that inference “does not require the use of any training data and the model itself does not store training data.” This stands in tension with US pleadings advancing the compressed‑copy theory, underscoring how different legal systems may reach contrasting conclusions about the same technical architecture, illustrating how different legal traditions can produce materially different results when confronted with identical technology.[9]
Getty secured limited trademark victories in the UK ruling. The court found that earlier Stable Diffusion versions had infringed Getty’s trademarks by generating outputs with distorted watermarks, though these violations affected only outdated software that had already been superseded. Mrs. Justice Smith added an obiter observation: if she was wrong about the weights not being copies, Stability AI would be liable because staff knew works were scraped without consent and discussed removing watermarks from training data. Both sides claimed victory, with Getty emphasising trademark findings while Stability AI celebrated the copyright rejection. The judge cautioned that her ruling applies specifically to diffusion models and that “if their models actually keep works in their memory,” other AI companies “could be infringing under English copyright law.” Getty’s American case continues in San Francisco, where different fair use doctrine may produce entirely different outcomes.
A Landmark German AI Copyright Ruling: GEMA v. OpenAI
On 11 November 2025, the 42nd Civil Chamber of the Munich I Regional Court (Landgericht München I) largely upheld GEMA’s claims against two OpenAI group companies (Az. 42 O 14139/24), granting injunctive relief and ordering the provision of information and damages in relation to the alleged use and reproduction of protected song lyrics. The ruling concerned lyrics from nine well‑known German songs. [6]
According to the court’s press summary, infringement was found on two levels: (i) reproduction of the relevant lyrics within the language models (described as “memorisation” embodied in model parameters) and (ii) reproduction/making available of the lyrics in ChatGPT outputs generated in response to simple user prompts. The court rejected arguments that liability lay only with end‑users and held that these acts were not excused by copyright limitations, including the EU text‑and‑data‑mining exception. [6]
The court dismissed GEMA’s additional claim based on a violation of general personality rights arising from the incorrect attribution of modified lyrics. Even so, the decision is an important signal that—at least on the court’s current analysis—developers and operators can face direct exposure where models reproduce protected expressive content, and where licensing/opt‑out and output‑controls are not robust. [6]
The first‑instance Munich ruling sharpens an emerging divergence: it treats memorisation in model parameters as sufficient ‘embodiment’ for reproduction under German law, in contrast to the English court’s approach. The decision is appealable, and OpenAI has indicated it is considering next steps, so the legal position in Germany may still evolve.In the US, fair-use doctrine is evolving along a different path, centred on transformativeness and market substitution. [9][6]
The case is therefore likely to remain in flux, but the direction of travel is clear. Systems capable of outputting near‑verbatim protected lyrics are likely to face increasing pressure in major markets to rely on licensed datasets and implement tighter output controls, especially if similar rulings proliferate. [6]
Publishers Fight Back: Traditional Media’s Counter-Offensive
The New York Times elevated the stakes when it sued OpenAI and Microsoft in December 2023 [10], bringing institutional credibility and resources to challenge AI training. The newspaper advanced a sophisticated multi-layered theory spanning training, memorisation, and market substitution. Training involved literal copying—transferring Times articles from Times servers to OpenAI infrastructure. Memorisation occurred when ChatGPT sometimes reproduced substantial article portions verbatim when prompted strategically. Market substitution happened when users accessed Times reporting through ChatGPT instead of subscribing.
Procedural developments—including the denial of OpenAI’s motion to dismiss core copyright claims and expansive preservation orders—indicate that the court is treating the New York Times’ allegations as substantial enough to warrant extensive discovery. Judge Sidney Stein rejected OpenAI’s motion to dismiss on April 4, 2025, allowing primary copyright claims to advance. Then in May 2025, Magistrate Judge Ona T. Wang issued a preservation order requiring OpenAI to retain all ChatGPT conversation logs affecting over 400 million users globally. When OpenAI challenged this burden, Judge Stein upheld the order, demonstrating willingness to impose substantial discovery requirements.
The scale of this discovery highlights unprecedented challenges. Traditional copyright disputes might involve thousands of documents; here, relevant evidence potentially encompasses billions of user interactions, raising complex questions about privacy, privilege, and trade secrets. Yet the courts appear willing to mandate this level of discovery, recognising that without it, AI companies could exploit technical complexity to shield themselves from scrutiny.
A different challenge emerged with Dow Jones & Company and the New York Post’s lawsuit against Perplexity AI, filed on October 21, 2024. Unlike traditional AI training cases, this focused on “retrieval-augmented generation” (‘RAG’)—RAG technology that combines pre-trained models with real-time database queries. Perplexity allegedly scrapes news content into RAG databases, allowing users to bypass publishers’ websites entirely. The company marketed this as “Skip the links”, which the plaintiffs characterised as a brazen admission of market substitution intent.[11]
Perplexity defended its approach vigorously, arguing that AI-enhanced search represents transformative technology that benefits users by efficiently delivering information. The company claimed publishers wish this technology didn’t exist because they would prefer a world where “publicly reported facts are owned by corporations.” However, the plaintiffs distinguished Perplexity from the search engines stating that a traditional search enables the discovery of their work, while Perplexity provides substitutes for it.
RAG technology presents distinct legal questions from training-based models. Traditional AI training involves one-time copying into model parameters; RAG involves ongoing scraping and database maintenance. The copying is more direct, continuous, and obviously serves identical purposes as the original journalism. If courts ultimately treat some training uses as fair but view certain RAG implementations as non‑fair, AI companies may need to make difficult architectural choices about how they ingest and serve copyrighted content.
Regulatory Frameworks: When Legislatures Step In
While courts battle with questions of doctrine, regulators worldwide construct frameworks that may render moot some of the legal debates. From August 2, 2026, the EU AI Act’s applicable provisions will impose transparency and documentation obligations on many AI developers, including significant disclosure duties regarding training data and model behaviour, backed by fines of up to €15 million or a percentage of global turnover. [12] California’s AB 2013, effective January 2026, introduces training‑data‑transparency requirements for certain AI services. Together with the EU regime, these measures are likely to influence global practice because major providers operate in both jurisdictions since major AI companies serve both markets and cannot practically maintain different disclosure practices for different jurisdictions.[13]
The United Kingdom has so far pursued a more flexible, sector‑led approach, including exploring opt‑out mechanisms that would allow creators to exclude works from AI training while leaving room for unlicensed use of non‑opt‑out material.This approach acknowledges creators’ concerns while preserving AI companies’ ability to train on non-excluded material without licensing. Whether this balances competing interests or simply shifts burdens to creators remains contested, but it represents an alternative to Europe’s mandatory disclosure model.
Market-based solutions have emerged alongside litigation and regulation. Some AI companies now pursue direct licensing deals with major publishers and content providers. OpenAI signed agreements with the Associated Press and Axel Springer. Perplexity launched a Publisher Program in July 2024 sharing revenue with participating content sources. These voluntary arrangements suggest possible paths forward if litigation creates sufficient pressure.
Technical infrastructure for creator control develops rapidly. DeviantArt implemented opt-out systems in November 2022, allowing artists to exclude work from training datasets. Spawning’s ‘Have I Been Trained’ platform has facilitated over 80 million artwork opt-outs. The Coalition for Content Provenance and Authenticity (‘C2PA’), founded in 2021, expects ISO standardisation soon for technical standards allowing creators to embed machine-readable licensing preferences directly into digital files.
Whether these voluntary frameworks achieve critical mass adoption depends largely on litigation outcomes. If courts consistently rule AI training constitutes fair use, licensing incentives diminish. If courts find against AI companies, licensing becomes necessary for legal operation. The Anthropic settlement suggests a middle path. Companies may choose to pay rather than litigate even when fair use arguments have merit, valuing certainty over risk.
The Philosophical Question: Who Should Pay for Progress?
Beneath technical legal debate lies a fundamental philosophical question about the proper costs of innovation and who shoulders the risk during legal uncertainty. One perspective characterises AI training as analogous to human learning—reading books to acquire knowledge, studying art to understand techniques, examining code to grasp algorithms. These activities have never required the licensing of every studied work. Demanding that for AI training would impose prohibitive costs that would prevent the emergence of beneficial technology. Creators have no greater claim on AI training data than authors have on readers who learn from their books, this view holds.
The opposing perspective characterises AI training as industrial-scale commercial exploitation, not individual education. Companies aren’t reading to understand human culture—they’re ingesting millions of works to construct profit-generating products. Scale, purpose, and the commercial nature fundamentally distinguish this from the context of traditional fair use. Permitting unconstrained copying creates what critics term “content kleptocracy,” where a creator’s life work becomes the raw material for corporate profit, without consent or compensation.
Both narratives contain elements of validity. AI models genuinely transform training data into novel capabilities, assisting with tasks and generating content that did not exist in the source material. But they also depend entirely on that training data, and commercial success is derived directly from millions of creators’ unpaid labour. The transformation is authentic; so is the appropriation.
The Anthropic settlement reframes this debate by suggesting that, at least for well‑capitalised firms, large‑scale compensation to creators is financially feasible. The company paid $1.5 billion—substantial but not existential for a firm that just closed a $13 billion funding round valuing it at $183 billion. “This settlement marks the beginning of a necessary evolution toward a legitimate, market-based licensing scheme for training data,” observed tech industry lawyer, Cecilia Ziniti. “It’s not the end of AI, but the start of a more mature, sustainable ecosystem where creators are compensated.”[14]
Current practice places the costs of uncertainty overwhelmingly on the creators, who watch their work fuel AI systems, without compensation or control, unless and until they win lawsuits or secure regulatory protection. An alternative would place those costs on AI companies: if you want industrial-scale training on copyrighted works, obtain licences or accept liability risk should courts later determine you should have done so. In practice, there is little neutral ground: any legal framework, or its absence, tends to shift risk either toward creators or toward AI developers. In several prominent sectors—especially recorded music—licensing‑first or licensed‑only training is emerging as a de facto expectation among major rightsholders.
Any legal framework, or its absence, allocates risks and rewards among parties with divergent interests.
The fundamental tension persists regardless of judicial rulings, settlements, or regulatory actions. Technology enabling machines to learn from human creativity offers genuine benefits alongside genuine harms. Creators’ rights to control and profit from their work rest on centuries of legal tradition and international treaty obligations. We are negotiating boundaries between legitimate but competing claims, and negotiations have barely begun.
The next two to three years should prove critical. Major trials and appellate inflection points loom: the remaining issues in the US book‑training cases; continued motion practice in The New York Times v. OpenAI; the progression of Getty’s re‑filed US claims in Northern California; and, in the visual‑artist litigation, Andersen v. Stability AI, which is currently on a schedule pointing toward 2027. In Europe, the GEMA v. OpenAI decision has reframed the risk analysis for any model capable of reproducing protected text on demand, while the EU AI Act’s transparency and governance obligations begin to bite on a staged timetable. These developments will determine not merely who profits from AI, but what we believe creativity is worth, and what obligations we owe those whose expression trained the machines now augmenting or replacing them.
What emerges will shape not just technology law but our entire creative culture. The Anthropic settlement suggests one possible future where compensation flows alongside innovation. The judicial split suggests another where geography determines legality. The aggressive litigation suggests yet another where uncertainty paralyses development. Which future we inhabit depends on choices being made right now in courtrooms, legislative chambers, and corporate boardrooms—choices that will echo through decades of creative and technological development.
NOTES
[1] Bartz v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal.) (order on fair use June 23, 2025); settlement announced Aug. 26, 2025; preliminarily approved Sept. 25, 2025.
[2] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. June 25, 2025).
[3] Thomson Reuters Enter. Ctr. GmbH v. ROSS Intel. Inc., 765 F. Supp. 3d 382 (D. Del. 2025).
[4] UMG Recordings, Inc. v. Uncharted Labs, Inc., No. 1:24-cv-04777 (S.D.N.Y. filed June 24, 2024), settled Oct. 29, 2025; Warner Music Group settled with Udio (Nov. 19, 2025) and Suno (Nov. 25, 2025); Sony Music Entm’t v. Suno, Inc., No. 1:24-cv-11611 (D. Mass. filed June 24, 2024) (pending as of Dec. 2025). See also GEMA, “Suno AI and Open AI: GEMA sues for fair compensation” (stating proceedings against Suno in Munich Regional Court in Jan. 2025).
[5] Concord Music Grp., Inc. v. Anthropic PBC, No. 3:23-cv-01092 (M.D. Tenn. filed Oct. 18, 2023), transferred to No. 5:24-cv-03811 (N.D. Cal.).
[6] GEMA v. OpenAI, Az. 42 O 14139/24 (Munich I Regional Court / Landgericht München I, 11 Nov. 2025) (press summary); see Landgericht München I press release “Urteil GEMA gegen Open AI” (11 Nov. 2025) and the unofficial English translation circulated by IFRRO (press release translation dated 11 Nov. 2025). Practitioner commentary: Bird & Bird, “Landmark ruling of the Munich Regional Court (GEMA v OpenAI) on copyright and AI training” (14 Nov. 2025). Reporting on appeal posture: Reuters, “OpenAI used song lyrics in violation of copyright laws, German court says” (11 Nov. 2025) (OpenAI “considering next steps”; decision appealable). Chronology of proceedings: GEMA, “Suno AI and Open AI: GEMA sues for fair compensation” (stating proceedings against OpenAI filed in Nov. 2024; proceedings against Suno in Munich).
[7] Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. filed Jan. 13, 2023); Order Granting in Part Motion to Dismiss, Oct. 30, 2023.
[8] Getty Images (US), Inc. v. Stability AI, Inc., No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023); voluntarily dismissed and refiled as Getty Images (US), Inc. v. Stability AI, Ltd., No. 3:25-cv-06891 (N.D. Cal. filed Aug. 14, 2025).
[9] Getty Images (US), Inc. v. Stability AI Ltd., [2025] EWHC 2863 (Ch) (Nov. 4, 2025).
[10] The New York Times Co. v. OpenAI, Inc., No. 1:23-cv-11195 (S.D.N.Y. filed Dec. 27, 2023); Order Denying Motion to Dismiss, Apr. 4, 2025.
[11] Dow Jones & Co. v. Perplexity AI Inc., No. 1:24-cv-07984 (S.D.N.Y. filed Oct. 21, 2024).
[12] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (AI Act), O.J. (L) 2024/1689, entered into force Aug. 1, 2024, full application Aug. 2, 2026.
[13] Cal. A.B. 2013, Artificial Intelligence Training Data Transparency Act (2024), effective Jan. 1, 2026.
[14] https://www.yahoo.com/news/articles/anthropic-reaches-1-5-billion-221849839.html


