this post was submitted on 20 Oct 2025
40 points (95.5% liked)

Selfhosted

52498 readers
1933 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Basically what the title says. I know online providers like GPTzero exist, but when dealing with sensitive documents, I would prefer to keep it in-house. A lot of people like to talk big about open source models for generating stuff, but the detection side is not as discussed I feel.

I wonder if this kind of local capability can be stitched into a browser plugin. Hell, doesn’t even need to be a locally hosted service on my home network. Local app on-machine should be fine. But being able to host it as a service to use from other machines would be interesting.

I’m currently not able to give it a proper search but the first glance results are either for people trying to evade these detectors or people trying to locally host language models.

top 9 comments
sorted by: hot top controversial new old
[–] MartianSands@sh.itjust.works 62 points 4 days ago (1 children)

Be cautious about trusting the AI-detection tools, they're not much better than the AI they're trying to detect, because they're just as prone to false positives and false negatives as the agents they claim to detect.

It's also inherently an arms race, because if a tool exists which can easily and reliably detect AI generated content then they'd just be using that tool for their training instead of what they already use, and the AI would quickly learn to defeat it. They also wouldn't be worrying about their training data being contaminated by the output of existing AI, Which is becoming a genuine problem right now

[–] iii@mander.xyz 18 points 4 days ago

if a tool exists which can easily and reliably detect AI generated content then they'd just be using that tool for their training

Generative Adversarial Networks are an example of that idea in action.

[–] null_dot@lemmy.dbzer0.com 28 points 4 days ago

There are no decent GPT-detection tools.

If there were they would be locally hosted language models, and you'd need a reasonable GPU.

[–] Eheran@lemmy.world 15 points 4 days ago* (last edited 4 days ago)

What do you want to achieve with it? They were still (super) unreliable last time I checked. Unreliable as in "you might as well roll a dice". Oh and they all say they are world leading, best etc.

[–] splendoruranium@infosec.pub 7 points 4 days ago* (last edited 4 days ago)

Basically what the title says. I know online providers like GPTzero exist, but when dealing with sensitive documents, I would prefer to keep it in-house. A lot of people like to talk big about open source models for generating stuff, but the detection side is not as discussed I feel.
I wonder if this kind of local capability can be stitched into a browser plugin. Hell, doesn’t even need to be a locally hosted service on my home network. Local app on-machine should be fine. But being able to host it as a service to use from other machines would be interesting.
I’m currently not able to give it a proper search but the first glance results are either for people trying to evade these detectors or people trying to locally host language models.

In general it's a fool's errand, I'm afraid. What's the specific context in which you're trying to apply this?

[–] sobchak@programming.dev 0 points 3 days ago

Excessive use of em-dashes, emojis, and other characters that aren't on standard keyboards. I think these companies purposely have the models generate this stuff so it is easily detectable (so they avoid training on their own slop).

[–] FreedomAdvocate@lemmy.net.au -2 points 3 days ago

The best way to detect an LLM made text would be, funnily enough, passing it into an LLM and asking it.

[–] falseWhite@lemmy.world 0 points 4 days ago

I would guess you'd need a model that's at least just as powerful and smart as the original model that created the content. Otherwise it's like asking a 10 year old to proof read an article written by an adult.