Some funny shit; Anthropic bought and destroyed millions of physical books

Dopious

Senior Member
Founding Member
Sapphire Member
Patron
Bronze Star Bronze Star Bronze Star Bronze Star Bronze Star
Joined
Apr 5, 2025
Messages
1,193
Reaction Score
3,683
Feedback
4 / 0 / 0
And they got away with it legally.

I missed this one myself, so I thought I`d share it here if anyone missed it.

Earlier this year, Meta was accused of downloading a gigantic 82TB torrent of books to train language models. Meta's defense was that they seeded the file as little as possible and thus did not spread the copyrighted material further.

Competitor Anthropic, according to court documents, has taken a different approach. The developer bought millions of physical books, scanned the texts into digital files, and then destroyed the originals. The books are needed to train artificial intelligence, which requires astronomical amounts of training data.

The arrangement provides some compensation to the book industry and is on firmer legal grounds, but it may seem wasteful to buy and destroy millions of paper books. Hopefully, all the pulp was recycled. According to the report, Tom Turvey was given the task of acquiring “all the world’s books” for Anthropic in early 2024. Turvey previously worked on Google’s book scanning project (Google Books).

Scanning is sometimes destructive , but Anthropic stands out because of the sheer volume of books destroyed. Google also used a non-destructive method to scan books that were borrowed from libraries and then returned.

A judge ruled that Anthropic's method was fair use because the books were purchased legally beforehand and scanned into files that were never distributed further.

Source: https://arstechnica.com/ai/2025/06/...llions-of-print-books-to-build-its-ai-models/
 
That's some serious tonnage, 2024 wasn't that long ago thats a rather impressive amount of time to consume and scan all them books as well!
 
Back
Top