this post was submitted on 24 Jun 2026

140 points (81.2% liked)

Selfhosted

61003 readers

650 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Detailed Rules Post

Be civil.
No spam.
Posts are to be related to self-hosting.
Don't duplicate the full text of your blog or readme if you're providing a link.
Submission headline should match the article title.
No trolling.
Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details. Tags [CBH] or [AIP] are required, see the links in Rule 8 for details.
AI-related discussions and AI-involved promotional posts have additional requirements for tagging, as noted in Rule 7 and the AI & Promotional Post Expanded Rules post, and find example disclosures here.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago

MODERATORS

curbstickle@anarchist.nexus

curbstickle_lw@lemmy.world

140

Do you host your own AI? (aussie.zone)

submitted 1 month ago by SuspiciousCarrot78@aussie.zone to c/selfhosted@lemmy.world

203 comments fedilink hide all child comments

Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

top 50 comments

sorted by: hot top controversial new old

[–] algernon@lemmy.ml 98 points 1 month ago (6 children)

Yes. My Actual Intelligence lives in my head, and runs mostly on coffee.

[–] portifornia@piefed.social 34 points 1 month ago (2 children)

Just coffee?!? That's cool.

Mine runs on:

coffee
spite
tortilla chips
& shame

[–] searabbit@piefed.social 12 points 1 month ago

If that's not already on a shirt it should be

[–] algernon@lemmy.ml 9 points 1 month ago (2 children)

Mostly on coffee, not exclusively. Noticable amounts of spite & tortilla chips are also present, yes, but... no shame.

load more comments (2 replies)

[–] tal@lemmy.today 14 points 1 month ago (1 children)

Do you get many hallucinations?

[–] algernon@lemmy.ml 15 points 1 month ago (2 children)

Only when I'm deprived of coffee.

load more comments (2 replies)

[–] SuspiciousCarrot78@aussie.zone 11 points 1 month ago* (last edited 1 month ago) (3 children)

I'll make sure to send you flowers, Algernon lol

load more comments (3 replies)

[–] GreenCrunch@piefed.blahaj.zone 6 points 1 month ago

critical security bug: if coffee is taken away my head hurts :(

[–] zitrone@europe.pub 5 points 1 month ago (1 children)

As we know AI stands for "An Indian", so if you're not from India, its actually impossible to self host.

Well, unless you manage to trap one in your basement, but that would violate human rights and hopefully also break the laws of your country.

load more comments (1 replies)

[–] ButteredBread@sh.itjust.works 4 points 1 month ago (2 children)

That doesn't sound artificial.

[–] thenextguy@lemmy.world 4 points 1 month ago

With sufficient coffee, mine shows considerable artifice.

load more comments (1 replies)

[–] brucethemoose@lemmy.world 70 points 1 month ago* (last edited 1 month ago) (8 children)

An aside for anyone reading this:

https://sleepingrobots.com/dreams/stop-using-ollama/

And that barely scratches the surface. Please.

Use anything but Ollama. Even APIs.

[–] SuspiciousCarrot78@aussie.zone 13 points 1 month ago (3 children)

Llama.cpp or death!

load more comments (3 replies)

[–] plasma8726@lemmy.today 10 points 1 month ago (3 children)

Thanks for this link. Because of this article, I had claude stand up a llama.cpp container next to my already running ollama container. It ran side by side tests with the same model and parameters, and the results blew ollama out of the water. I'm in the process of moving hermes and openwebgui over to the llama.cpp instance to see how it goes day to day.

load more comments (3 replies)

[–] pinball_wizard@lemmy.zip 9 points 1 month ago

I agree that the concerns listed there are smells, and I wasn't aware of some of the options listed there.

Thank you for sharing this!

[–] vagabond@lemmy.dbzer0.com 5 points 1 month ago

Didn't know this. Going to switch this weekend, thanks for sharing this!

load more comments (4 replies)

[–] frongt@lemmy.zip 43 points 1 month ago (4 children)

Yes. Openwebui/ollama for LLM, comfyui for stable diffusion. I just dick around with it as a toy.

[–] mesamunefire@piefed.social 11 points 1 month ago* (last edited 1 month ago) (1 children)

Same. Its somewhat useful on some very small scripting or tasks...but its mostly just to try out a new model or two. Its not really useful for anything big.

I will have to say....even my tiny models are about as good as Chatgpt/Claude/etc... which makes me think about how much people are spending on tokens regularly. I was able to get the same kind of python script started with my local tiny model that was comparable to the newest Claude code offerings.

[–] Lettuceeatlettuce@lemmy.ml 5 points 1 month ago (3 children)

What local models have you been using? And what hardware are you running them on? I've been playing with local LLMs a bit for exactly your use case.

I have zero interest in vibe coding or full agentic workflows. But having a local LLM generate a Bash script to help me automate parts of my home lab infrastructure would be nice.

load more comments (3 replies)

[–] D_Air1@lemmy.ml 23 points 1 month ago

Yeah, I'm using qwen 31b a3b on an amd 9070xt requires a bit of cpu offloading, but still plenty fast. Using it wall llama.cpp. Combine that with some mcp's such as ddg-search to make it truly useful by actually being able to search online.

I mostly use it for small tedious tasks with well defined inputs and outputs. For example when hyprland recently changed from their own configuration language to lua. At first I started going line by line translating my config to the new lua language until I realized oh wait this is exactly the type of thing that ML is useful for. Going from the well defined hyprland configuration language to their also well defined lua syntax. It banged it out in less than a minute with only a single mistake which I easily fixed. The mistake it made was that it forgot to translate the comments to lua. It did it in less than a minute and worked first try. Where as I had made several typos and gotten a few lines wrong when I was doing it by hand.

Not to say that I couldn't do it. I would have gotten it done in about half an hour, but less than a minute is a lot faster.

I also used it to transform a bunch of unstructured data into json data, so that I could then use purpose built tools like jq to parse that. If I'm having trouble finding certain information. I'll ask it to find me some resources to look at.

Basically small well defined tasks and parsing data is what I use it for and it seems to be pretty good at that.

What I don't like is the way companies try to market it to people. I don't believe people should be trying to summarize emails or messages from loved ones, writing essays or any other creative tasks for the most part. Translating is okay. I don't expect a machine to be able to decide things for me or to be some filter between me and others.

[–] slazer2au@lemmy.world 18 points 1 month ago

Nope.

[–] fluxx@mander.xyz 15 points 1 month ago

I do, but I am becoming increasingly more disappointed as time goes on. Not just self hosted, llms in general. They sometimes help, but they mislead so many times and waste time that you don't even notice. I think that's the trap. When you succeed at a task, you become impressed but don't notice how many times it failed doing a simple task. And as soon as you scratch the surface, you see how you would have done it differently and perhaps in a better way. Even just googling is bad. It does research for you, but it has no critical thinking and can't decide what is better from the results it gets (other than google ranking) so it often leads you to think it did as good as you would, when it's nowhere near as good. Every time I did the googling myself after it did, I did it much better. And I mean MUCH better. Ask it to find the app, it misses the most important ones, hallucinates a bunch, for ex. I found this to be the case with frontier models as well.

Self hosting has its benefits, but seeing how the ecosystem looks right now, concluding this is a huge bubble is inevitable. It reminds me of crypto so much. It looks rich and plentiful, but as soon as you dig a mm under the surface - nobody has tested it, it's got a critical bug, it is overblown and there are issues with no response. No docs, no info, no nothing. For the biggest thing in technology in history, it is awfully hollow. I don't mean it in a condescending way, in fact community is enthusiastic and very helpful, it's just that it doesn't live up to what most would expect.

A caveat I need to mention is I have not used it for coding - I have an irrational fear and resistance towards it, being a programmer. I just won't touch it, even if it means the end of my career. I'm trying to be grown-up about it, but so far, I dont want to use it, for good and bad reasons.

[–] domi@lemmy.secnd.me 14 points 1 month ago (8 children)

Yes, I got a Strix Halo machine before the RAM price hike and use it to run all my ML stuff on it.

Currently using llama-swap with llama.cpp/ComfyUI and opencode/Open WebUI as frontend.

I'm running Qwen3.6-27b, Voxtral Mini 4b, Piper and Qwen Image. Also, some embedding and reranking models.

I use them for:

Tagging and classification of my documents in Paperless
Home Assistant (voice assistant)
Translations (both text and image)
Transcriptions
Some light coding and debugging
Avatar/Backdrop generation for DnD sessions

load more comments (8 replies)

[–] PetteriPano@lemmy.world 13 points 1 month ago (3 children)

Running qwen3.6 27b through llama.cpp.

It's about as capable as sonnet 3.5.

I use it for light scripting, but real coding is done by cloud models.

I'm also using it as the brain for my Hermes agent. It sends me digests of news, subreddits, chats that I'd like to read but don't have time for. It does a great job researching things on the web for me, too.

load more comments (3 replies)

[–] Strider@lemmy.world 11 points 1 month ago

No. I still have no use for it and everything I use is automated without at a far lower footprint.

[–] atzanteol@sh.itjust.works 9 points 1 month ago (15 children)

I've tried a few times but with only 8gig of vram it's simply not worth it.

load more comments (15 replies)

[–] Nednarb44@lemmy.world 8 points 1 month ago (2 children)

I do, I use ollama. I mostly just tinker, but I use with with home assistant for a quasi Alexa like experience with the voice assistant, I use it for summarizing some YouTube transcripts in too lazy to read/watch, and I've tried to see how capable it is with coding.

load more comments (2 replies)

[–] Steve@startrek.website 7 points 1 month ago (7 children)

I recently gave it a try with qwen3.5 and deepseek coder v2. I have a RTX3090 and these are the largest models that can run comfortably on it.

Conclusion, they are both fucking useless. Free tier claude runs circles.

load more comments (7 replies)

[–] Meatwagon@lemmy.dbzer0.com 6 points 1 month ago

I tried but I only have 16g of ram and it wouldn't complete a thought alas

[–] orenj@leminal.space 6 points 1 month ago

If I wanted AI for some reason, it'd be self-host or nothing.

[–] alzymologist@sopuli.xyz 6 points 1 month ago

Technically, TTS/STT are mostly MLs; I'm pretty sure many people run these. I have a setup but I'm better with buttons that with spoken words, and I listen to ambient sounds or music. I think some day I'll make voice assistant for talking to while driving, but that's not a trivial task hardware-wise, even if I used cloud LLM layer, which I won't. Putting AI on baremetal sounds like an interesting project.

I have a homemade "local agent" that can actually "code" somewhat, I use it just to figure out how this thing works on the inside practically. Mostly useless otherwise (also I have GPU that's older than AI, so it's kind of fun technical task to run this stuff on pure RAM+swap). Feels like the whole hype is greatly overrated, but I appreciate a chance to learn something new anyway.

[–] Shipgirlboy@sh.itjust.works 6 points 1 month ago

I've thought about it, but I actually could never think of anything I would do with it.

[–] rimu@piefed.social 5 points 1 month ago* (last edited 1 month ago)

The other day I made a machine learning model that classifies images as either 'a certain type of undesirable image' (no, not porn) or 'any other image'. It is 96.4% accurate and takes 14 ms to classify one image (using CPU only - with a GPU it could be 5x - 10x faster).

I plan to offer this as an API service that social media networks can use to filter posts.

[–] robber@lemmy.ml 5 points 1 month ago

I currently run Qwen3.6-27b on llama.cpp and use it via openwebui. Mostly, I use it for web research via tavily, to a lesser extent for coding and interactively learning about things that are new to me but common in training data (such as basic math or ML concepts).

[–] Sabata11792@ani.social 5 points 1 month ago* (last edited 1 month ago)

Running decencored Qwen3.6-27b and a 9b Gemma for RAG and scrapes on Ollama with a mostly vibe coded discord bot. Just got it to run tools and scrape and post news on a schedule. The first model I can run locally that's smart enough to be useful. May give Jan a try for the back end after reading that other guys rant.

Mostly use it for stupid questions I could have googled and to brag to friends.

[–] wrinkle2409@lemmy.cafe 5 points 1 month ago (2 children)

I set up ollama on our thinkstation in the lab and I use it for looking up documentation, generating readmes, searching papers, and sometimes coding when I know what to do but don't feel it is worth it to spend time on it myself. So basically the chat with web search.

load more comments (2 replies)

[–] november@piefed.blahaj.zone 5 points 1 month ago

Why would I?

[–] Jakeroxs@sh.itjust.works 5 points 1 month ago

Yes, llama-swap and I use it for home assistant text-gen notifications, basic coding tasks, etc

If anyone here self-hosts definitely check out llama-swap as it has some nifty features for hotswapping LLMs, image generation models and voice models.

[–] dfgxx@lemmy.zip 5 points 1 month ago

I ran through lmstudio because it really eazy, I ran some kind of qwen 3.6 27b imatrix neo code DI, it is the best local model for coding I tried, I think it can be better than some cloud model

[–] queerlilhayseed@piefed.blahaj.zone 5 points 1 month ago (12 children)

Yup, ollama, various models. I initially downloaded it because I, along with thousands of other people, wanted to see what would happen if I made models debate with each other after RAGging them with various books (The Prince, The Art of War, The complete works of Shakespeare, etc.).

The results were uninteresting and I abandoned the project pretty quickly. I'll sometimes use them for code analysis but they're too slow on my rig to be really useful.

load more comments (12 replies)

[–] iceberg314@slrpnk.net 4 points 1 month ago (1 children)

Ollama with gemma 4 for LLM stuff, coding brainstorming, etc.

Comfy ui with z-image or stable diffusion for images.

load more comments (1 replies)

[–] chaospatterns@lemmy.world 4 points 1 month ago* (last edited 1 month ago)

Partially. I started with hosting my own llama3.2 + granite4 models using Ollama for my Home Assistant smart home and for general chat with OpenWebUI. I also run whisper for speech-to-text locally on my 1080 Ti GPU. I like the privacy and ownership of my self-hosted models, but I started to run into limitations with the small weights. So I built some tools that allow me to selectively route traffic to larger models hosted on DeepInfra depending on my need. For example, to GLM/Kimi models for code reviews or for my custom harnesses or harder problems.

[–] jaykrown@lemmy.world 4 points 1 month ago

I hosted Qwen 3.5 9b uncensored on my site at https://masland.tech/ for a while. I didn't really use it and no one else used it so I took it down. These days I'm spending most of my time finding uses for AI and accessibility. One of the next things I'm planning is a video to text reasoning system, primarily for the purpose of grading used electronic devices.

load more comments