this post was submitted on 14 Jan 2024
361 points (99.2% liked)
Technology
59135 readers
3376 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
They're the same issue tho. Piracy and using books for corporate AI training both should be fine. The same people going after data freedom are pushing this AI drama too. There's too much money in copyright holding and it's not being held by your favorite deviantart artists.
It's not the same issue at all.
Piracy distributes power. It allows disenfranchised or marginalized people to access information and participate in culture, no matter where they live or how much money they have. It subverts a top-down read-only culture by enabling read-write access for anyone.
Large-scale computing services like these so-called AIs consolidate power. They displace access to the original information and the headwaters of culture. They are for-profit services, tuned to the interests of specific American companies. They suppress read-write channels between author and audience.
One gives power to the people. One gives power to 5 massive corporations.
Extremely well-said.
Also, it's important to point out that the one that empowers people is the one that is consistently punished far more egregiously.
We have governments blocking the likes of Sci-Hub, Libgen, and Annas-Archive, but nobody is blocking Meta's LLMs for the same.
If they were treated similarly, I would be far less upset about Meta's arguments. However it's clear that governments prioritize the success of business over the success of humanity.
I wish we could be talking about the power imbalances of corporate bodies exercised through the use of capital ownership, instead of squabbling about how that differential is manifested through a specific act of piracy.
The reason we view acts of piracy different when they are committed by corporate bodies is because of the power of their capital, not because the act itself is any different. The issue with Meta and OpenAI using pirated data in the production of LMM's is that they maintain ownership of the final product to be profited from, not that the LMM comes to exist in the first place (even if it is through questionable means). Had they come to create these models from data that they already owned (I need not remind you that they have already claimed their right to a truly sickening amount of it, without having paid a cent), their profiting from it wouldn't be any less problematic - LLM's will still undermine the security of the working class and consolidate wealth into fewer and fewer hands. If we were to apply copyright here as it's being advocated, nothing fundamental will change in that dynamic; in fact, it will only reinforce the basis of that power imbalance (ownership over capital being the primary vehicle) and delay the inevitable (continued consolidation).
If you're really concerned with these corporations growing larger and their influence spreading further, then you should be directing your efforts at disrupting that vehicle of influence, not legitimizing it. I understand there's an enraging double-standard at play here, but the solution isn't to double down on private ownership, it should be to undermine and seize it for common ownership so that everyone benefits from the advancement.
So why are Meta, and say, Sci-Hub are treated so differently? I don't necessarily disagree, but it's interesting that we legally attack people who are sharing data altruistically (Sci-Hub gives research away for free so more research can be done, scientific research should be free to the world, because it benefits all of mankind), but when it comes to companies who break the same laws to just make more money, that's fine somehow.
It's like trying to improve the world is punished, and being a selfish greedy fucking pig is celebrated and rewarded.
Sci-Hub is so villified, it can be blocked at an ISP level (depending on where you live) and politicians are pushing for DNS-level blocking. Similar can be said for Libgen or Annas-Archive. Is anything like that happening to Meta? No? Huh, interesting. I wonder why Meta gets different treatment for similar behavior.
I am willing to defend Meta's use of this kind of data after the world has changed how they treat entities like Sci-Hub. Until that changes, all you are advocating for is for corporations to be able to break the law and for altruistic people to be punished. I agree they're the same, but until the law treats them the same, you're just giving freebies to giant corporations while fucking yourself in the ass.
To me it always seems to come back to nobility. Big corpo is the new nobility and they have certain privileges not available to the common folk. In theory it shouldn't exist but in practice it most certainly does.
The aristocracy never died, it just got a new name.
I mean the US is literally built on the fact that the aristocracy in the US didn't actually want to lose station, so they built a democracy that included many anti-democratic measures from the Senate to the Electoral College to only allowing land-owning white men to vote. The US was purpose built to serve the rich while paying lip-service to the poor.
"Conservatives" were literally always those who wanted to conserve the monarchy and aristocracy. Those were the things they originally wanted to conserve, and plainly still fucking do.
How people do not see this is a complete farce.
They are not. Meta is being sued, just like Sci-Hub was sued. So, one difference is that the suit involving Meta is still ongoing.
In any case, Meta did not create the dataset. IDK if they even shared it. The researcher who did is also being sued. The dataset has been taken down in response to a copyright complaint. IDK if it is available anywhere anymore. So the dataset was treated just like Sci-Hub. The sharing of the copyrighted material was stopped.
Meta downloading these books for AI training seems fairly straight-forward fair use to me. I don't see how what Meta did is anything like what Sci-Hub did.
So ISPs are blocking Meta for their breaking of copyright?
Because ISPs block Sci-Hub.
No, one of them is having governments trying to kick off the internet, and the other is allowed to continue doing what they're doing and the worst they'll face is a fine. Not even close to the same, completely disproportionate. If they were blocking all Meta LLMs until they had removed all copyrighted material, maybe we could say the same.
ISPs may block sites to prevent unauthorized copying. It's not a punishment for past wrong-doing. I'm not sure about the details, I think this differs a lot between jurisdictions. But basically, as ISPs they are involved in the unauthorized act of copying. Their servers copy the data to the end user/customer. So, they may be on the hook for infringement themselves if they don't act.
Again, I am not aware of Meta sharing the copyrighted books in question. So, I don't know what the legal basis for blocking Meta would be. If ISPs block a site without a legal basis, they are probably on the hook for breach of contract.
IDK on what basis the sharing of Meta's LLMs could be stopped. If anyone could claim copyright it would be Meta itself and they allow sharing them. (I have doubts if AI models are copyrightable under current US law.)
https://www.nytimes.com/2024/01/08/technology/openai-new-york-times-lawsuit.html
If the New York Times' evidence is true (I haven't seen the evidence, so I can't comment on veracity), then you can recreate copyrighted works with LLMs, and as such, they're doing the same thing as the Pirate Bay, distributing copyrighted works without authorization and making money off the venture.
So far, no ISPs are blocking Meta for this.
They pirated the books. Is that not legally relevant?