Is it legal to train LLMs using copyright protected works if the output is likely to compete with the works that the AI model was trained on? Or is training LLMs no different to teaching a child to write well?
We now have two contrasting court judgments on the legality of using pirated books to train LLMs (Bartz v Anthropic (Claude) and Kadrey v Meta (Llama)). One decision resulted in a $1.5bn payment for the plaintiff authors – likely the largest copyright settlement in history. The other decision resulted in the plaintiffs’ copying claims being dismissed under the USA defence of "fair use".
In Anthropic, it was held that:
• Where the purpose of copying pirated books is not solely to train LLMs, none of the copying will amount to fair use, and authors must be paid for it.
• But copying purchased books (where the hard copy is subsequently destroyed) is fair use, even where the purpose is not solely to train LLMs, as this is merely "format shifting".
• The subsequent training step (for both purchased and pirated books) was also fair use, as even though this involved Claude “memorising” copies of its training material, because LLM training is a “highly transformative” purpose.
• The fact that LLM outputs may compete with its training material is irrelevant to the "fair use" defence, because training LLMs is no different to teaching children to write well.
In Meta, it was held that:
• The basis of copyright protection is incentivising human authors and the risk of disincentivising them is the key consideration in the "fair use" analysis.
• Where LLM outputs don’t compete with their training data and the sole purpose of copying is to train LLMs, that copying is "fair use" even if the training data has been pirated.
• The fair use defence is not likely to succeed where LLM outputs directly compete with the works that the LLM was trained on, thereby undermining the market for the original works and the incentive for humans to create. Had the plaintiffs run this argument, they probably would have won.
Somewhat perplexingly:
• the potentially “winning argument” that Judge Chhabria criticised the plaintiffs for not running in Meta was the very same argument that Judge Alsup dismissed as “irrelevant” in Anthropic; and
• the same use of pirated works to train LLMs that was deemed fair use in Meta led to a $1.5bn settlement in Anthropic.
How did these cases result in such different outcomes? And what do the common threads of analysis in these decisions mean for the remaining AI copyright litigation currently making its way through the courts?
Fair use factors
The USA "fair use" defence, which is different to the “fair dealing” exceptions in New Zealand (see here for more), is considered by analysing four factors, with differing weight given to each:
(1) The purpose of the use, including whether it was commercial – and whether it was transformative or merely supplants the original.
(2) The nature of the copyright work – with greater protection for works of "creative expression".
(3) The amount and substantiality of the portion used.
(4) The effect on the market for (or value of) the copyright protected works.
In both Anthropic and Meta, factor two favoured the plaintiffs because their works were creative expression (books). However, factor two is typically given less weight in the overall analysis.
Factor three was deemed irrelevant in both cases, because while 100% of the plaintiffs' books were copied and “memorised” by the LLMs, neither model was capable of reproducing substantial snippets of their original training material, due to filters applied during programming.
Both decisions turned on the application of factors one and four. These are considered to be the most important factors in fair use analysis, with most weight given to factor four. The courts’ views on each are considered below:
Factor one: Training LLMs is a transformative purpose, but creating a central library is not
Anthropic acknowledged that it had set out to create a “central library of all the world’s books” by copying over seven million pirated books, together with purchasing hard copy books that it then digitised. Subsets of the library were then used to train Claude, however not all books were used for training and after training, the library was maintained for future uses, which may or may not include training AI models.
This acknowledgement effectively cost Anthropic $1.5bn. Judge Alsup held that “pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own “use” and not a transformative one.”
By contrast, Meta witnesses insisted that the sole purpose of torrenting pirated books was to train Llama, a factual position which may have saved it billions.
Judge Chhabria noted that LLMs have many different purposes, many of which do not compete with books. As training Llama was held to be Meta’s only intended use and this purpose was transformative, so too was the pirating of the books it was trained on.
Factor four: Competition with the training works
Three options for market harm to training works were considered under factor four:
(1) regurgitation of training works so that the users could access them for free;
(2) damage to the market for licensing works for AI training; and
(3) the generation of new works that compete with their training material.
The first two options were dismissed by both Judges. Due to filters applied during programming, neither model was capable of producing infringing outputs. If either model had been capable of producing infringing outputs, the outcome could have been quite different. The requisite “market harm” was held not to include harm to the plaintiffs’ ability to licence their books for training purposes. If the "fair use" defence applied, they reasoned, there would be no need for such a licence and therefore no market to damage.
Judges Alsup and Chhabria then arrived at vastly different conclusions regarding the relevance of LLMs competing with (or diluting the market for) their training material.
While Judge Alsup accepted that Claude will create an “explosion of works” competing with the plaintiffs’ works, he determined that (unless those works had been pirated) this would not displace demand for the plaintiffs’ books. Anthropic’s training of Claude was no different to training children to write well by reading books, he concluded, which is not the kind of competitive or creative displacement that he considered the Copyright Act should be concerned with.
Judge Chhabria took a very different view. He held that using books to teach children to write is “not remotely like” using books to train AI models. No matter how transformative LLM training may be, he stressed that it will be difficult for AI developers to establish "fair use" where the evidence supports a finding that copyright protected works are being used to make billions of dollars “while enabling the creation of an endless stream of competing works that significantly harm the market” for the works the AI model was trained on.
If the plaintiffs in Meta had alleged market harm by dilution and substitution, ultimately leading to the undermining of incentivisation for humans to create – this argument would likely have won their case. Supreme Court support for this position can be found in the Warhol v Goldsmith decision, where it was held that the most important "fair use" factor is whether the new work is likely to substitute for the original that was copied.
However, unfortunately for the plaintiffs in Meta, they did not run this argument. Judge Chhabria recorded that his judgment “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one”. A statement which will no doubt make all litigators cringe.
So, what does this mean for future cases?
These cases suggest that if AI developers pirate copyright protected works for the sole purpose of training AI models (and no other ancillary purpose) then their use will amount to "fair use". However, the defence may not succeed if outputs are capable of competing with the original training works and thereby undermining the incentive for further human creation, or where the AI model’s outputs infringe copyright.
Both cases alluded to the possibility that the liability outcome might have been different had the LLM outputs been infringing. Meta suggests that if outputs are substantially similar to the original works, there may be market dilution and harm to the plaintiffs. Market harm by way of either dilution/substitution or by way of infringing output could be highly relevant factors for pending cases where the output, such as images, video, music and code, arguably can compete with and/or substantially replicate their training data. There are many of those cases currently making their way through the courts.
The disagreement on the significance of market harm also leaves room for future debates. Anthropic supports the idea that such harm is outside the scope of the Copyright Act. While Meta suggests that better evidence of market harm through competition or substitution “will often cause plaintiffs to decisively win the fourth factor – and thus win the fair use question overall”.
New Zealand copyright law
Under the Copyright Act 1994, New Zealand has a "fair dealing" exception, which allows the use of copyright protected work without the author’s permission for the purposes of research, private study, criticism, review and news reporting only. These discrete categories are significantly more limited than the USA’s "fair use" defence, and AI developers of commercially released models in this jurisdiction may struggle to fall within them.
Currently, there are no proposed reforms to copyright laws in New Zealand in relation to AI, and while Australia is consulting on possible copyright law updates, it has recently ruled out a broad text and data mining exception to copyright infringement, which had been advocated for by members of the technology sector. In July 2025, MBIE released a paper noting that “fairly attributing and compensating creators and authors of copyright works can support continued creation, sharing, and availability of new works to support ongoing training and refinement of AI models and systems”.
This position may resonate with Judge Chhabria, who noted that the incentivisation of human creativity (such as through attribution and compensation) is not likely to stifle AI innovation, particularly in light of the substantial projected revenues in the generative AI sector.
