cross-posted from: https://lemmy.dbzer0.com/post/41302017
The SoA is organising a day of protest against Meta following revelations of pirated books being used to train their large language models
On Thursday 20 March, The Atlantic broke the story of how Meta has used the Library Genesis (LIbGen) dataset, which is full of pirated material, to develop their AI systems.
The revelations detailed by The Atlantic come against the background of the recent government consultation into Artificial Intelligence (AI) and copyright and the #MakeItFair campaign which sees the UK creative industries fighting back against the proposed changes to copyright law, which would favour multinational tech companies, but irremediably damage the creative industries.
Piracy is not theft.
Let’s be honest here, if the “product” someone sell it’s data (video, audio, text, programs, ecc…) and you copy it without giving the creator a cent, that’s pretty much theft. ALSO>>> Piracy itself it’s not the issue. That’s something that everyone (me included) do. And to some extent it’s free advertising to the creator of the work, expanding by many times the market for his creations. Also OLD CONTENT’s “piracy” it’s basically a necessity for the digital preservation of many piece of media art.
BUT
AI training it’s different. Without control it will eat up the whole market with cheap knockoffs and enshittificate everything.
I mean what Meta did hardly counts as piracy imo. They used the authors’ works without their permission to train their ai for profit. There’s a big difference between that and individuals pirating books to read and maybe making those books more accessible to others for free
Removed by mod
Commercial/state-enforced AI crawlers overburdening services and forcing admins to increase cost and time spent dealing with these DDoS attacks is much closer to theft than the piracy itself. Piracy doesn’t make people lose money, AI crawlers do.
If I host a website for the general public, I’m not paying money for 200 foreign AI crawlers to consume most of the bandwidth and CPU and leave legit users, whom I created the website for, with scraps. Even Wikipedia is feeling it.
Many AI crawlers are immoral for other reasons as well, especially when we are talking about companies (Meta, Google) or states (CCP) doing it who are known for corporating with intelligence/defense or are engaged in human rights abuses.
Turning this discussion to be about piracy is imo a distraction.
While I agree, I also hate Meta. They should have seeded.
I’m not mad about metas piracy I’m mad they did it without seeding the content. Poor internet citizen.