Technology

85155 readers

4410 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

361

Meta admits using pirated books to train AI, but won't pay for it (www.techspot.com)

submitted 2 years ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

88 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] SnotFlickerman@lemmy.blahaj.zone 3 points 2 years ago* (last edited 2 years ago) (2 children)

So why are Meta, and say, Sci-Hub are treated so differently? I don't necessarily disagree, but it's interesting that we legally attack people who are sharing data altruistically (Sci-Hub gives research away for free so more research can be done, scientific research should be free to the world, because it benefits all of mankind), but when it comes to companies who break the same laws to just make more money, that's fine somehow.

It's like trying to improve the world is punished, and being a selfish greedy fucking pig is celebrated and rewarded.

Sci-Hub is so villified, it can be blocked at an ISP level (depending on where you live) and politicians are pushing for DNS-level blocking. Similar can be said for Libgen or Annas-Archive. Is anything like that happening to Meta? No? Huh, interesting. I wonder why Meta gets different treatment for similar behavior.

I am willing to defend Meta's use of this kind of data after the world has changed how they treat entities like Sci-Hub. Until that changes, all you are advocating for is for corporations to be able to break the law and for altruistic people to be punished. I agree they're the same, but until the law treats them the same, you're just giving freebies to giant corporations while fucking yourself in the ass.

[–] SlopppyEngineer@lemmy.world 3 points 2 years ago (1 children)

To me it always seems to come back to nobility. Big corpo is the new nobility and they have certain privileges not available to the common folk. In theory it shouldn't exist but in practice it most certainly does.

[–] SnotFlickerman@lemmy.blahaj.zone 2 points 2 years ago* (last edited 2 years ago)

The aristocracy never died, it just got a new name.

I mean the US is literally built on the fact that the aristocracy in the US didn't actually want to lose station, so they built a democracy that included many anti-democratic measures from the Senate to the Electoral College to only allowing land-owning white men to vote. The US was purpose built to serve the rich while paying lip-service to the poor.

"Conservatives" were literally always those who wanted to conserve the monarchy and aristocracy. Those were the things they originally wanted to conserve, and plainly still fucking do.

How people do not see this is a complete farce.

[–] General_Effort@lemmy.world 1 points 2 years ago (2 children)

So why are Meta, and say, Sci-Hub treated so differently?

They are not. Meta is being sued, just like Sci-Hub was sued. So, one difference is that the suit involving Meta is still ongoing.

In any case, Meta did not create the dataset. IDK if they even shared it. The researcher who did is also being sued. The dataset has been taken down in response to a copyright complaint. IDK if it is available anywhere anymore. So the dataset was treated just like Sci-Hub. The sharing of the copyrighted material was stopped.

Meta downloading these books for AI training seems fairly straight-forward fair use to me. I don't see how what Meta did is anything like what Sci-Hub did.

[–] SnotFlickerman@lemmy.blahaj.zone 1 points 2 years ago* (last edited 2 years ago) (1 children)

So ISPs are blocking Meta for their breaking of copyright?

Because ISPs block Sci-Hub.

No, one of them is having governments trying to kick off the internet, and the other is allowed to continue doing what they're doing and the worst they'll face is a fine. Not even close to the same, completely disproportionate. If they were blocking all Meta LLMs until they had removed all copyrighted material, maybe we could say the same.

[–] General_Effort@lemmy.world 2 points 2 years ago (1 children)

ISPs may block sites to prevent unauthorized copying. It's not a punishment for past wrong-doing. I'm not sure about the details, I think this differs a lot between jurisdictions. But basically, as ISPs they are involved in the unauthorized act of copying. Their servers copy the data to the end user/customer. So, they may be on the hook for infringement themselves if they don't act.

Again, I am not aware of Meta sharing the copyrighted books in question. So, I don't know what the legal basis for blocking Meta would be. If ISPs block a site without a legal basis, they are probably on the hook for breach of contract.

IDK on what basis the sharing of Meta's LLMs could be stopped. If anyone could claim copyright it would be Meta itself and they allow sharing them. (I have doubts if AI models are copyrightable under current US law.)

[–] SnotFlickerman@lemmy.blahaj.zone 1 points 2 years ago* (last edited 2 years ago)

https://www.nytimes.com/2024/01/08/technology/openai-new-york-times-lawsuit.html

In its lawsuit Wednesday, the Times accused Microsoft and OpenAI of creating a business model based on “mass copyright infringement,” stating that the companies’ AI systems were “used to create multiple reproductions of The Times’s intellectual property for the purpose of creating the GPT models that exploit and, in many cases, retain large portions of the copyrightable expression contained in those works.”

Publishers are concerned that, with the advent of generative AI chatbots, fewer people will click through to news sites, resulting in shrinking traffic and revenues.

The Times included numerous examples in the suit of instances where GPT-4 produced altered versions of material published by the newspaper.

In one example, the filing shows OpenAI’s software producing almost identical text to a Times article about predatory lending practices in New York City’s taxi industry.

But in OpenAI’s version, GPT-4 excludes a critical piece of context about the sum of money the city made selling taxi medallions and collecting taxes on private sales.

In its suit, the Times said Microsoft and OpenAI’s GPT models “directly compete with Times content.”

If the New York Times' evidence is true (I haven't seen the evidence, so I can't comment on veracity), then you can recreate copyrighted works with LLMs, and as such, they're doing the same thing as the Pirate Bay, distributing copyrighted works without authorization and making money off the venture.

So far, no ISPs are blocking Meta for this.

[–] antonim@lemmy.dbzer0.com 0 points 2 years ago

Meta downloading these books for AI training seems fairly straight-forward fair use to me.

They pirated the books. Is that not legally relevant?