this post was submitted on 20 Oct 2025
40 points (95.5% liked)

Selfhosted

52523 readers
980 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Basically what the title says. I know online providers like GPTzero exist, but when dealing with sensitive documents, I would prefer to keep it in-house. A lot of people like to talk big about open source models for generating stuff, but the detection side is not as discussed I feel.

I wonder if this kind of local capability can be stitched into a browser plugin. Hell, doesn’t even need to be a locally hosted service on my home network. Local app on-machine should be fine. But being able to host it as a service to use from other machines would be interesting.

I’m currently not able to give it a proper search but the first glance results are either for people trying to evade these detectors or people trying to locally host language models.

you are viewing a single comment's thread
view the rest of the comments
[–] null_dot@lemmy.dbzer0.com 28 points 5 days ago (1 children)

There are no decent GPT-detection tools.

If there were they would be locally hosted language models, and you'd need a reasonable GPU.

[–] ggtdbz@lemmy.dbzer0.com 1 points 11 hours ago (1 children)

I think I should have been more clear, this is exactly what I'm asking about. I'm somewhat surprised by the reaction this post got, this seems like a very normal thing to want to host.

Doesn't help that some people here are replying as if I was asking to locally host the "trick" that is feeding a chatbot text and asking it whether it's machine-generated. Ideally the software I think I'm looking for would be something that has a bank of LLM models and can kind of do some sort of statistical magic to see how likely a block of tokens is to be generated by them. Would probably need to have quantized models just to make it run at a reasonable speed. So it would, for example, feed the first x tokens in, take stock of how the probability table looks for the next token, compare it to the actual next token in the block, and so on.

Maybe this is already a thing and I just don't know the jargon for it. I'm pretty sure I'm more informed about how these transformer algorithms work than the average user of them, but only just.

[–] null_dot@lemmy.dbzer0.com 1 points 3 hours ago

Sorry I'm still not really sure what you're asking for.

I use Open Web UI, which is the worst name ever, but it's a web ui for interacting with chat format gen AI models.

You can install that locally and point it at any of the models hosted remotely by an inference provider.

So you host the UI but someone else is doing the GPU intensive "inference".

There seems to be some models for t his task available on huggingface like this one:

https://huggingface.co/fakespot-ai/roberta-base-ai-text-detection-v1

The difficulty may be finding a model which is hosted by an inference provider. Most of the models available on huggingface are just the binary model which you can download and run locally. The popular ones are hosted by inference providers so you can just point a query at their API and get a response.

As an aside, it's possible or likely that you know more about how Gen AI works than I do, but I think this type of "probability table for the next token" is from the earlier generations. Or, this type of probability inference might be a foundational concept, but there's a lot more sophistication layered on top now. I genuinely don't know. I'm super interested in these technologies but there's a lot to learn.