this post was submitted on 23 Mar 2025
772 points (97.8% liked)
Technology
67669 readers
5714 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
then again
The made up bullshit aside, this should be a quite clear indicator of an actual GDPR breach
Maybe he has a insta profile with the name of his kids in his bio
How would that be a GDPR breach?
Irrelevant. The data being public does not make it up for grabs.
They store his personal data without his permission.
also
Storing it badly, does not make them excempt.
If you run an chatbot with with integrated web search, it garbs that info as a web crawler does, it does not mean that this data really is in the “knowledge/statistics” of the AI itself.
Nobody stores the information if it is like this, it is only temporary used to generate that specific output.
(You can not use chatGPT without websearch on chatgpt domain (only if you self host, or use a service like DDG))
That is another great question. If it is transformative use of the primary data source, then that is likely illegal, as nobody gave permission for them to transform and process that personal data. If it is not transformative, and it just gives access to the primary source like a search engine on the other hand, then the problem is that if it returns copyrighted data, it is no longer fair use most likely.
That's a good point, that muddies the waters a bit. Makes it hard to say wether it's spouting info from the web or if it's data from the model.
I can't comment on actual legality in this case, but I feel handling personal data like this, even from the open web, in a context where hallucinations are an overwhelming possibility, is still morally wrong. I don't know the GDPR well enough to say wether it covers temporary information like this, but I kinda hope it does.
Lol, I definitely hope not 🤪 imagine a web without search engines, with GDPR counting for temporary information as well, it would not be feasible to offer.
hmm, true enough. But in my mind there's a clear difference between showing information unedited and referring to its source, and this.
Most LLM these days show what they searched for generating the post, but not many seem to manually validate the summary of the LLM by clicking on those links…