this post was submitted on 25 Oct 2023
76 points (82.2% liked)

Technology

59427 readers
3846 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] BetaDoggo_@lemmy.world 5 points 1 year ago

There is likely some csam in most of the models as filtering it out of a several billion image set is nearly impossible even with automated methods. This material likely has little to no effect on outputs however since it's likely scarce and was probably tagged incorrectly.

The bigger concern is users down stream finetuning models on their own datasets with this material. This has been happening for a while, though I won't point fingers(Japan).

There's not a whole lot that can be done about it but I also don't think there's anything that needs to be done. It's already illegal and it's already removed from most platforms semiautomatically. Having more of it won't change that.