Dopious
Senior Member
Founding Member
Sapphire Member
Patron





And they got away with it legally.
I missed this one myself, so I thought I`d share it here if anyone missed it.
Earlier this year, Meta was accused of downloading a gigantic 82TB torrent of books to train language models. Meta's defense was that they seeded the file as little as possible and thus did not spread the copyrighted material further.
Competitor Anthropic, according to court documents, has taken a different approach. The developer bought millions of physical books, scanned the texts into digital files, and then destroyed the originals. The books are needed to train artificial intelligence, which requires astronomical amounts of training data.
The arrangement provides some compensation to the book industry and is on firmer legal grounds, but it may seem wasteful to buy and destroy millions of paper books. Hopefully, all the pulp was recycled. According to the report, Tom Turvey was given the task of acquiring “all the world’s books” for Anthropic in early 2024. Turvey previously worked on Google’s book scanning project (Google Books).
Scanning is sometimes destructive , but Anthropic stands out because of the sheer volume of books destroyed. Google also used a non-destructive method to scan books that were borrowed from libraries and then returned.
A judge ruled that Anthropic's method was fair use because the books were purchased legally beforehand and scanned into files that were never distributed further.
Source: https://arstechnica.com/ai/2025/06/...llions-of-print-books-to-build-its-ai-models/
I missed this one myself, so I thought I`d share it here if anyone missed it.
Earlier this year, Meta was accused of downloading a gigantic 82TB torrent of books to train language models. Meta's defense was that they seeded the file as little as possible and thus did not spread the copyrighted material further.
Competitor Anthropic, according to court documents, has taken a different approach. The developer bought millions of physical books, scanned the texts into digital files, and then destroyed the originals. The books are needed to train artificial intelligence, which requires astronomical amounts of training data.
The arrangement provides some compensation to the book industry and is on firmer legal grounds, but it may seem wasteful to buy and destroy millions of paper books. Hopefully, all the pulp was recycled. According to the report, Tom Turvey was given the task of acquiring “all the world’s books” for Anthropic in early 2024. Turvey previously worked on Google’s book scanning project (Google Books).
Scanning is sometimes destructive , but Anthropic stands out because of the sheer volume of books destroyed. Google also used a non-destructive method to scan books that were borrowed from libraries and then returned.
A judge ruled that Anthropic's method was fair use because the books were purchased legally beforehand and scanned into files that were never distributed further.
Source: https://arstechnica.com/ai/2025/06/...llions-of-print-books-to-build-its-ai-models/