this post was submitted on 02 Aug 2023
38 points (100.0% liked)

Free and Open Source Software

17783 readers
71 users here now

If it's free and open source and it's also software, it can be discussed here. Subcommunity of Technology.


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

Now we are facing an unprecedented growth of AI as a whole. Do you think is time for FSF elaborate a new version of GPL to incorporate the new challenges of AI in software development to keep protecting users freedom?

all 15 comments
sorted by: hot top controversial new old
[–] jcolag@lemmy.sdf.org 26 points 1 year ago* (last edited 1 year ago)

I keep saying "no" to this sort of thing, for a variety of reasons.

  1. "You can use this code for anything you want as long as you don't work in a field that I don't like" is pretty much the opposite of the spirit of the GPL.
  2. The enormous companies slurping up all content available on the Internet do not care about copyright. The GPL already forbids adapting and redistributing code without licensing under the GPL, and they're not doing that. So another clause that says "hey, if you're training an AI, leave me out" is wasted text that nobody is going to read.
  3. Making "AI" an issue instead of "big corporate abuse" means that academics and hobbyists can't legally train a language model on your code, even if they would otherwise comply with the license.
  4. The FSF has never cared about anything unless Stallman personally cared about it on his personal computer, and they've recently proven that he matters to them more than the community, so we probably shouldn't ever expect a new GPL.
  5. The GPL has so many problems (because it's been based on one person's personal focuses) that they don't care about or isolate in random silos (like the AGPL, as if the web is still a fringe thing) that AI barely seems relevant.

I mean, I get it. The language-model people are exhausting, and their disinterest in copyright law is unpleasant. But asking an organization that doesn't care to add restrictions to a license that the companies don't read isn't going to solve the problem.

[–] wave_walnut@kbin.social 13 points 1 year ago (1 children)

The problem of recent AI is about fair use of data, not about copyright. To solve the AI problem, we need laws to stop abuse of data rather than to stop copying of code.

[–] BarryZuckerkorn@beehaw.org 5 points 1 year ago

Some portion of the "data" fed into these models is copyrighted, though. Github's copilot is trained on code. Does it violate the GPL to train an AI model on all GPL source code published out there?

[–] jarfil@beehaw.org 7 points 1 year ago

Too soon. The GPL is a license aligning prevalent copyright laws to some ideological goals. There are no prevalent copyright laws regarding AI yet, so there is nothing to base a copyright license on.

First step: introduce AI into copyright law (and pray The Mouse doesn't introduce it first).

[–] ashley@lemmy.ca 5 points 1 year ago (1 children)

It might be time to start thinking about it, however it will depend on the consensus among the legal system on weather you need to provide attribution through AI.

[–] lemmyvore@feddit.nl 3 points 1 year ago (1 children)

There is already consensus, it just hasn't been concluded explicitly yet.

There is no "AI" and there's no "learning", so there's no new unbeaten path in law. like some would make you believe. LLMs are data processing software that take input data and output other data. In order to use the input data you have to conform to its licensing, and you can't hide behind arguments like "I don't know what the software is doing with the data" or "I can't identify the input data in the output data anymore".

LLM companies will eventually be found guilty of copyright infringement and they'll settle and start observing licensing terms like everybody else. There are plenty of media companies with lots of money with a vested interest in copyright.

[–] db0@lemmy.dbzer0.com 6 points 1 year ago* (last edited 1 year ago) (2 children)

That's not how copyrights work. They only care about copying or replicating that data. The hint is in the name

[–] lemmyvore@feddit.nl 2 points 1 year ago* (last edited 1 year ago) (1 children)

Copyright is not just about copying the data. It's a name that stuck but it's more accurately to call it "author rights". The law awards the rights holder extensive rights, including deciding how the data is used.

And (as an aside) permission by omission doesn't work as an excuse either, if the right to use the data in some way hasn't been explicitly granted it most likely doesn't apply.

[–] db0@lemmy.dbzer0.com 2 points 1 year ago (2 children)

No that's not what copyrights are. The idea that they're "author rights" has no basis in law

[–] lemmyvore@feddit.nl 1 points 1 year ago

Why insist to argue this point when a simple visit to Wikipedia will show I'm right?

load more comments (1 replies)
load more comments (1 replies)
[–] smileyhead@discuss.tchncs.de 2 points 1 year ago

Richard Stallman talked about this topic there: https://framatube.org/w/1DbsMfwygx7rTjdBR4DPXp

Can't find timestamp tho.

GPLv3 already takes all of that. Programs that train AI have normal licencing applied. Programs that was modified by AI must be under GPL too. The neural network itself if not a program, it's a format and is always modifiable anyway as there is no source code. You can take any neural network and train it futher without data it was trained on before.

[–] bedrooms@kbin.social 1 points 1 year ago

To license what? Code or text? I don't think both aren't going to have enough impact and adoption.

[–] bedrooms@kbin.social 1 points 1 year ago* (last edited 1 year ago)

To license what? Code or text? I don't think either would have enough impact nor adoption.