this post was submitted on 24 Jun 2026
139 points (81.2% liked)

Selfhosted

60366 readers
937 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Detailed Rules Post

  1. Be civil.

  2. No spam.

  3. Posts are to be related to self-hosting.

  4. Don't duplicate the full text of your blog or readme if you're providing a link.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

top 50 comments
sorted by: hot top controversial new old
[–] Kazumara@discuss.tchncs.de 3 points 6 days ago

No, I'm not interested in that topic

[–] vegetaaaaaaa@lemmy.world 1 points 5 days ago
[–] dotAlexX@lemmy.world 1 points 6 days ago

I would love to run and host a local LLM on my phone just to tinker and learn. I found a tutorial on setting DeepSeek on your Android phone using Termux but it is a year old. I'm sure there are better more efficient LLMs that can run on a phone now.

[–] algernon@lemmy.ml 98 points 1 week ago (8 children)

Yes. My Actual Intelligence lives in my head, and runs mostly on coffee.

[–] portifornia@piefed.social 34 points 1 week ago (2 children)

Just coffee?!? That's cool.

Mine runs on:

  • coffee
  • spite
  • tortilla chips
  • & shame
[–] searabbit@piefed.social 12 points 1 week ago

If that's not already on a shirt it should be

[–] algernon@lemmy.ml 9 points 1 week ago (2 children)

Mostly on coffee, not exclusively. Noticable amounts of spite & tortilla chips are also present, yes, but... no shame.

load more comments (2 replies)
[–] tal@lemmy.today 14 points 1 week ago (1 children)

Do you get many hallucinations?

[–] algernon@lemmy.ml 15 points 1 week ago (2 children)

Only when I'm deprived of coffee.

load more comments (2 replies)
[–] SuspiciousCarrot78@aussie.zone 11 points 1 week ago* (last edited 1 week ago) (3 children)

I'll make sure to send you flowers, Algernon lol

load more comments (3 replies)

critical security bug: if coffee is taken away my head hurts :(

[–] zitrone@europe.pub 5 points 1 week ago (1 children)

As we know AI stands for "An Indian", so if you're not from India, its actually impossible to self host.

Well, unless you manage to trap one in your basement, but that would violate human rights and hopefully also break the laws of your country.

load more comments (1 replies)
load more comments (3 replies)
[–] brucethemoose@lemmy.world 69 points 1 week ago* (last edited 1 week ago) (8 children)

An aside for anyone reading this:

https://sleepingrobots.com/dreams/stop-using-ollama/

And that barely scratches the surface. Please.

Use anything but Ollama. Even APIs.

[–] Kroko@feddit.online 1 points 5 days ago

Thanks. Good to know

[–] SuspiciousCarrot78@aussie.zone 13 points 1 week ago (3 children)
load more comments (3 replies)
[–] plasma8726@lemmy.today 10 points 1 week ago (3 children)

Thanks for this link. Because of this article, I had claude stand up a llama.cpp container next to my already running ollama container. It ran side by side tests with the same model and parameters, and the results blew ollama out of the water. I'm in the process of moving hermes and openwebgui over to the llama.cpp instance to see how it goes day to day.

load more comments (3 replies)
[–] pinball_wizard@lemmy.zip 9 points 1 week ago

I agree that the concerns listed there are smells, and I wasn't aware of some of the options listed there.

Thank you for sharing this!

[–] vagabond@lemmy.dbzer0.com 5 points 1 week ago

Didn't know this. Going to switch this weekend, thanks for sharing this!

load more comments (3 replies)
[–] frongt@lemmy.zip 43 points 1 week ago (4 children)

Yes. Openwebui/ollama for LLM, comfyui for stable diffusion. I just dick around with it as a toy.

[–] mesamunefire@piefed.social 11 points 1 week ago* (last edited 1 week ago) (1 children)

Same. Its somewhat useful on some very small scripting or tasks...but its mostly just to try out a new model or two. Its not really useful for anything big.

I will have to say....even my tiny models are about as good as Chatgpt/Claude/etc... which makes me think about how much people are spending on tokens regularly. I was able to get the same kind of python script started with my local tiny model that was comparable to the newest Claude code offerings.

[–] Lettuceeatlettuce@lemmy.ml 5 points 1 week ago (3 children)

What local models have you been using? And what hardware are you running them on? I've been playing with local LLMs a bit for exactly your use case.

I have zero interest in vibe coding or full agentic workflows. But having a local LLM generate a Bash script to help me automate parts of my home lab infrastructure would be nice.

load more comments (3 replies)
load more comments (3 replies)
[–] D_Air1@lemmy.ml 23 points 1 week ago

Yeah, I'm using qwen 31b a3b on an amd 9070xt requires a bit of cpu offloading, but still plenty fast. Using it wall llama.cpp. Combine that with some mcp's such as ddg-search to make it truly useful by actually being able to search online.

I mostly use it for small tedious tasks with well defined inputs and outputs. For example when hyprland recently changed from their own configuration language to lua. At first I started going line by line translating my config to the new lua language until I realized oh wait this is exactly the type of thing that ML is useful for. Going from the well defined hyprland configuration language to their also well defined lua syntax. It banged it out in less than a minute with only a single mistake which I easily fixed. The mistake it made was that it forgot to translate the comments to lua. It did it in less than a minute and worked first try. Where as I had made several typos and gotten a few lines wrong when I was doing it by hand.

Not to say that I couldn't do it. I would have gotten it done in about half an hour, but less than a minute is a lot faster.

I also used it to transform a bunch of unstructured data into json data, so that I could then use purpose built tools like jq to parse that. If I'm having trouble finding certain information. I'll ask it to find me some resources to look at.

Basically small well defined tasks and parsing data is what I use it for and it seems to be pretty good at that.

What I don't like is the way companies try to market it to people. I don't believe people should be trying to summarize emails or messages from loved ones, writing essays or any other creative tasks for the most part. Translating is okay. I don't expect a machine to be able to decide things for me or to be some filter between me and others.

[–] slazer2au@lemmy.world 18 points 1 week ago
[–] fluxx@mander.xyz 15 points 1 week ago

I do, but I am becoming increasingly more disappointed as time goes on. Not just self hosted, llms in general. They sometimes help, but they mislead so many times and waste time that you don't even notice. I think that's the trap. When you succeed at a task, you become impressed but don't notice how many times it failed doing a simple task. And as soon as you scratch the surface, you see how you would have done it differently and perhaps in a better way. Even just googling is bad. It does research for you, but it has no critical thinking and can't decide what is better from the results it gets (other than google ranking) so it often leads you to think it did as good as you would, when it's nowhere near as good. Every time I did the googling myself after it did, I did it much better. And I mean MUCH better. Ask it to find the app, it misses the most important ones, hallucinates a bunch, for ex. I found this to be the case with frontier models as well.

Self hosting has its benefits, but seeing how the ecosystem looks right now, concluding this is a huge bubble is inevitable. It reminds me of crypto so much. It looks rich and plentiful, but as soon as you dig a mm under the surface - nobody has tested it, it's got a critical bug, it is overblown and there are issues with no response. No docs, no info, no nothing. For the biggest thing in technology in history, it is awfully hollow. I don't mean it in a condescending way, in fact community is enthusiastic and very helpful, it's just that it doesn't live up to what most would expect.

A caveat I need to mention is I have not used it for coding - I have an irrational fear and resistance towards it, being a programmer. I just won't touch it, even if it means the end of my career. I'm trying to be grown-up about it, but so far, I dont want to use it, for good and bad reasons.

[–] PetteriPano@lemmy.world 13 points 1 week ago (3 children)

Running qwen3.6 27b through llama.cpp.

It's about as capable as sonnet 3.5.

I use it for light scripting, but real coding is done by cloud models.

I'm also using it as the brain for my Hermes agent. It sends me digests of news, subreddits, chats that I'd like to read but don't have time for. It does a great job researching things on the web for me, too.

load more comments (3 replies)
[–] domi@lemmy.secnd.me 13 points 1 week ago (8 children)

Yes, I got a Strix Halo machine before the RAM price hike and use it to run all my ML stuff on it.

Currently using llama-swap with llama.cpp/ComfyUI and opencode/Open WebUI as frontend.

I'm running Qwen3.6-27b, Voxtral Mini 4b, Piper and Qwen Image. Also, some embedding and reranking models.

I use them for:

  • Tagging and classification of my documents in Paperless
  • Home Assistant (voice assistant)
  • Translations (both text and image)
  • Transcriptions
  • Some light coding and debugging
  • Avatar/Backdrop generation for DnD sessions
load more comments (8 replies)
[–] Strider@lemmy.world 11 points 1 week ago

No. I still have no use for it and everything I use is automated without at a far lower footprint.

[–] atzanteol@sh.itjust.works 9 points 1 week ago (15 children)

I've tried a few times but with only 8gig of vram it's simply not worth it.

load more comments (15 replies)
[–] Nednarb44@lemmy.world 8 points 1 week ago (2 children)

I do, I use ollama. I mostly just tinker, but I use with with home assistant for a quasi Alexa like experience with the voice assistant, I use it for summarizing some YouTube transcripts in too lazy to read/watch, and I've tried to see how capable it is with coding.

load more comments (2 replies)
[–] Meatwagon@lemmy.dbzer0.com 6 points 1 week ago

I tried but I only have 16g of ram and it wouldn't complete a thought alas

[–] Steve@startrek.website 6 points 1 week ago (7 children)

I recently gave it a try with qwen3.5 and deepseek coder v2. I have a RTX3090 and these are the largest models that can run comfortably on it.

Conclusion, they are both fucking useless. Free tier claude runs circles.

load more comments (7 replies)
[–] orenj@leminal.space 6 points 1 week ago

If I wanted AI for some reason, it'd be self-host or nothing.

[–] Shipgirlboy@sh.itjust.works 6 points 1 week ago

I've thought about it, but I actually could never think of anything I would do with it.

[–] alzymologist@sopuli.xyz 6 points 1 week ago

Technically, TTS/STT are mostly MLs; I'm pretty sure many people run these. I have a setup but I'm better with buttons that with spoken words, and I listen to ambient sounds or music. I think some day I'll make voice assistant for talking to while driving, but that's not a trivial task hardware-wise, even if I used cloud LLM layer, which I won't. Putting AI on baremetal sounds like an interesting project.

I have a homemade "local agent" that can actually "code" somewhat, I use it just to figure out how this thing works on the inside practically. Mostly useless otherwise (also I have GPU that's older than AI, so it's kind of fun technical task to run this stuff on pure RAM+swap). Feels like the whole hype is greatly overrated, but I appreciate a chance to learn something new anyway.

[–] rimu@piefed.social 5 points 1 week ago* (last edited 1 week ago)

The other day I made a machine learning model that classifies images as either 'a certain type of undesirable image' (no, not porn) or 'any other image'. It is 96.4% accurate and takes 14 ms to classify one image (using CPU only - with a GPU it could be 5x - 10x faster).

I plan to offer this as an API service that social media networks can use to filter posts.

[–] robber@lemmy.ml 5 points 1 week ago

I currently run Qwen3.6-27b on llama.cpp and use it via openwebui. Mostly, I use it for web research via tavily, to a lesser extent for coding and interactively learning about things that are new to me but common in training data (such as basic math or ML concepts).

[–] Sabata11792@ani.social 5 points 1 week ago* (last edited 1 week ago)

Running decencored Qwen3.6-27b and a 9b Gemma for RAG and scrapes on Ollama with a mostly vibe coded discord bot. Just got it to run tools and scrape and post news on a schedule. The first model I can run locally that's smart enough to be useful. May give Jan a try for the back end after reading that other guys rant.

Mostly use it for stupid questions I could have googled and to brag to friends.

[–] wrinkle2409@lemmy.cafe 5 points 1 week ago (2 children)

I set up ollama on our thinkstation in the lab and I use it for looking up documentation, generating readmes, searching papers, and sometimes coding when I know what to do but don't feel it is worth it to spend time on it myself. So basically the chat with web search.

load more comments (2 replies)
[–] november@piefed.blahaj.zone 5 points 1 week ago

Why would I?

[–] Jakeroxs@sh.itjust.works 5 points 1 week ago

Yes, llama-swap and I use it for home assistant text-gen notifications, basic coding tasks, etc

If anyone here self-hosts definitely check out llama-swap as it has some nifty features for hotswapping LLMs, image generation models and voice models.

load more comments
view more: next ›