this post was submitted on 21 Aug 2024
132 points (89.8% liked)

Technology

59377 readers
4525 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

My original, editorialized title: Ars Technica Sells Out


Linking to this because I know people here read Ars Technica, and I totally didn't become a subscriber three days before this was announced. Nope. No sir.

you are viewing a single comment's thread
view the rest of the comments
[–] azl@lemmy.sdf.org 6 points 2 months ago

I want Ars content to be part of whatever training data is provided to the best models. How does that get done without appearing like they are being bought?

Even if their contract explicitly states that it is a data sharing agreement only and the products of the media organization (articles/investigations) are not grounds for breach or retaliation, it is assumed that there is now some impartiality in future reporting.

So, for all media companies, the options seem to be:

  1. Contribute to the greater good by openly permitting site scraping (for $0)
  2. Allow data sharing to contracted parties only (for a fee)
  3. Public or privately prohibit use of any data, and then seek damages down the road for theft/copyright infringement when the legal framework has been established.

Is there a GPL or other license structure that permits data sharing for LLM training in a way that it does not get transformed into something evil?