this post was submitted on 09 Jun 2023
22 points (100.0% liked)

Technology

37603 readers
609 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] cwagner@discuss.tchncs.de 10 points 1 year ago (2 children)

Stack Overflow senior leadership is working on a strategy to protect Stack Overflow data from being misused by companies building LLMs. While working on this strategy, we decided to stop the dump until we could put guardrails in place.

I do not see this working in any way :( Might be time do delete my SO history as well.

[–] tojikomori@kbin.social 6 points 1 year ago* (last edited 1 year ago)

This reply's interesting:

How can data licensed under the CC-BY-SA licenses (that SO content is licensed under) be "misused"? The license explictly allows others to do essentially anything they want with the data as long as attribution is given, in particular profit off of it.

When SO content is applied as parametric knowledge I'd expect the outcome to fail both the "BY" and the "SA" clauses, since model interpreters can't provide attribution for it and their output won't share the license. That's true even if output is considered public domain: CC-BY-SA content can't be moved into a public domain equivalent license. It seems practically indistinguishable from using any other in-copyright content as training material.

None of that's to say SO is right to stop data dumps. It feels like they're trying to find a technical solution to a legal problem, perhaps even one that rises to criminality on the part of Open AI and others?

load more comments (1 replies)