this post was submitted on 16 Jan 2026

59 points (86.4% liked)

Selfhosted

54641 readers

1588 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

ruud@lemmy.world

Loki@lemmy.world

CannaVet@lemmy.world

HybridSarcasm@lemmy.world

devve@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Selfhosted coding assistant? (midwest.social)

submitted 2 days ago* (last edited 2 days ago) by wasp_eggs@midwest.social to c/selfhosted@lemmy.world

22 comments fedilink hide all child comments

I'd like to set up a local coding assistant so that I can stop using Google to ask complex questions to for search results.

I really don't know what I'm doing or if there's anything that's available that respects privacy. I don't necessarily trust search results for this kind of query either.

I want to run it on my desktop, Ryzen 7 5800xt + Radeon RX 6950xt + 32gb of RAM. I don't need or expect data center performance out of this thing. I'm also a strict Sublime user so I'd like to avoid VS Code suggestions as much as possible.

My coding laptop is an oooooold MacBook Air so I'd like something that can be ran on my desktop and used from my laptop if possible. No remote access needed, just to use from the same home network.

Something like LM Studio and Qwen sounds like it's what I'm looking for, but since I'm unfamiliar with what exists I figured I would ask for Lemmy's opinion.

Is LM Studio + Qwen a good combo for my needs? Are there alternatives?

I'm on Lemmy Connect and can't see comments from other instances when I'm logged in, but to whomever melted down from this question your relief is in my very first sentence:

to ask complex questions to for search results.

all 23 comments

sorted by: hot top controversial new old

[–] perry@aussie.zone 4 points 1 day ago (3 children)

Qwen coder model from Huggingface, following the instructions there to run it in llama.cpp. Once that’s up: OpenCode and use the custom OpenAI API to connect it.

You’ll get far better results than trying to use other local options out of the box.

There may be better models potentially but I’ve found Qwen 2.5 etc to be pretty fantastic overall, and definitely a fine option beside Claude/ChatGPT/Gemini. I’ve tested the lot and it’s usually far more down to instruction and AGENTS.md instructions/layout than it is down to just the model.

[–] madcaesar@lemmy.world 2 points 1 day ago

Do you mind sharing your agents md?

[–] melfie@lemy.lol 1 points 1 day ago* (last edited 1 day ago)

The main thing that has stopped me from running models like this so far is VRAM. My server has a RTX 4060 with 8GB, and not sure that can reasonably run a model like this.

Edit:

This calculator seems pretty useful: https://apxml.com/tools/vram-calculator

According to this, I can run Qwen3 14B with 4B quant and 15-20% CPU/NVMe offloading and get 41 tokens / s. It seems 4B quant reduces accuracy by 5-15%.

The calculator even says I can run the flagship model with 100% NVMe offloading and get 4 tokens / s.

I didn’t realize NVMe offloading was even a thing and not sure if it actually is supported or works well in practice. If so, it’s a game changer.

Edit:

The llama.cpp docs do mention that models are memory mapped by default and loaded into memory as needed. Not sure if that means that a MoE model like qwen3 235b can run with 8GB of VRAM and 16GB of RAM, albeit at a speed that is an order of magnitude slower like the calculator suggests is possible.

[–] 70k32@sh.itjust.works 1 points 1 day ago

This. Llama.cpp with Vulkan backend running in docker-compose, some Qwen3-Coder quantization from huggingface and pointing Opencode to that local setup with a OpenAI-compatible is working great for me.

[–] Coolcoder360@lemmy.world 21 points 2 days ago

I've not found them useful yet for more than basic things. I tried Ollama, it let's you run locally, has simple setup, stays out of the way.

[–] ryokimball@infosec.pub 11 points 2 days ago (1 children)

I have heard good things about LM Studio from several professional coders and tinkers alike. Not tried it myself yet though, but I might have to bite the bullet because I can't seem to get ollama to perform how I want.

TabbyML is another thing to try.

[–] wasp_eggs@midwest.social 2 points 2 days ago (1 children)

Thanks for the reply!

I had noticed TabbyML but something about their wording made me rethink and then the next day I saw a post on here regarding the same phrasing, I decided to leave it alone after that

[–] scrubbles@poptalk.scrubbles.tech 5 points 2 days ago

Yeah I tried tabby too and they had like a mandatory "we share your code " line and I hoped out. Like if you're going to do that I might as well just use claude

[–] herseycokguzelolacak@lemmy.ml 4 points 2 days ago

I recommend llama.cpp instead of LM Studio.

[–] hummingbird@lemmy.world 9 points 2 days ago

LM Studio in combination with Kilo Code for IDE integration works pretty nicely locally. Here is a good video covering the basics to get you going: https://www.youtube.com/watch?v=rp5EwOogWEw

[–] TomAwezome@lemmy.world 8 points 2 days ago

I get good mileage out of the Jan client and Void editor, various models will work but Jan-4B tends to do OK, maybe a Meta-Llama model could do alright too. The Jan client has settings where you can start up a local OpenAI-compatible server, and Void can be configured to point to that localhost URL+port and specific models. If you want to go the extra mile for privacy and you're on a Linux distro, install firejail from your package manager and run both Void and Jan inside the same namespace with outside networking disabled so it only can talk on localhost. E.g.: firejail --noprofile --net=none --name=nameGoesHere Jan and firejail --noprofile --net=none --join=nameGoesHere void, where one of them sets up the namespace (--name=) and the other one joins the namespace (--join=)

[+] artwork@lemmy.world -15 points 2 days ago* (last edited 2 days ago) (3 children)

"Self-hosted mind atrophy with skills degradation running in parallel."
No. Absolutely no. You should code with your mind, and stay creative.

[–] IMALlama@lemmy.world 13 points 2 days ago (2 children)

Straight up vibe coding is a horrible idea, but I'll happily take tools to reduce mundane tasks.

The project I'm currently working on leans on Temporal for durable execution. We define the activities and workflows in protobufs and utilize codegen for all the boring boiler plate stuff. The project hasa number of http endpoints that are again defined in protos, along with their inputs and outputs. Again, lots of code gen. Is code gen making me less creative or degrading my skills? I don't think so. It sure makes the output more consistent and reduces the opportunity for errors.

If I engage gen AI during development, which isn't very often, my prompts are very targeted and the scope is narrow. However, I've found that gen AI is great for writing and modifying tests and with a little prompting you can get pretty solid unit test coverage for a verity of different scenarios. In the case of the software I write at work the creativity is in the actual code and the unit tests are often pretty repetitive (happy path, bad input 1...n, no result, mock an error at this step, etc). Once you know how to do that there's no reason not to offload it IMO.

[–] artwork@lemmy.world -1 points 2 days ago

Thank you! You do you.

[–] MasterBlaster@lemmy.world 2 points 1 day ago (1 children)

We can't ignore this. We need to know how it is done if we want to earn salaries. Reality rarely makes a dent in the corporate herd until years later.

By then, careers are obliterated.

There are ways to protect your mind in the meantime.

[–] sem@lemmy.blahaj.zone 1 points 1 day ago

What is the way

[–] Shimitar@downonthestreet.eu 1 points 2 days ago (3 children)

While you are correct, as all tools AI is not bad per se.

If you use ai to replace more lengthy documentation searches and write your own code that works out pretty well and speed up your work without degrading your coding. Granted, I got plainly incorrect answers as well, but at least I managed to be much more efficient.

Treat LLMs/ai as a glorified documentation aggregator and that's how you correctly use that tool.

Like, use a knife to cut and cook meat, not to cut another person body, and that's how you correctly use that tool too.

[–] poVoq@slrpnk.net 2 points 2 days ago* (last edited 2 days ago) (1 children)

In my experience using AI for that replaces legthy documentation searches with reading lengthy AI output that turns out to be full of halucinations. Net time saved usually negative.

[–] Shimitar@downonthestreet.eu 2 points 1 day ago

My recent experience regarding questions on documentation:

dovecot: shitty useless responses, totally made up
Gentoo linux: to be checked twice and mostly wrong or fake
godot: accurate and correct almost always, maybe examples not always 100% correct
C++ standard 17: correct, never had a wrong reply from llm, also the exact ples where on point and correct

I think that's all what I have used it for in the last six months.

Note: I used only Google search AI llm, nothing else.

So it seems that depends on what you ask.

[–] artwork@lemmy.world 0 points 2 days ago* (last edited 2 days ago) (1 children)

No, thank you. Sorry, never.

Not only that, but the huge probability of mistakes is just deafening. The last time I used LLM was in 2023 someone recommended for a task at paper work, and I got a literal headache in 10 minutes... Since then I never ever will use that sorrow for anything that is not for blackbox pentesting or experimental unverified data generated you may find in medicine or military isolated solutions.

That deafening feel that every single bit of output from that LLM or void machine may contain a mistake no soul is accountable for to ask about... A generated bit of someone's work you just cannot verify since no source nor human is available... How would you trace the rationale that resulted in the output shown?

Faster? Is that so... Doesn't verification of every output require even more time to test it and consider stable, to prove it is correct, to stay accountable for the knowledge and actions you perform as a developer, artist, researcher... human?

Your mind is to be trained to do a research, remember, and do not depend on someone's service to a level of predominance/replacement.
Meanwhile, effort, passion, creativity, empathy, and love, in turn, you carry, supports in long-term.

You may not care now, though, but you do you. It's your mind and memory you develop.

[–] Shimitar@downonthestreet.eu 0 points 2 days ago

Maybe you should check 5hat again. I never used llm before 2025, but proved itself useful for a few tasks. Yes check and verification still needed, but indeed made my life easier and got taken done faster. Quality was still a good as what I could do myself. Maybe that doesn't speak well of myself I don't know.

[–] Appoxo@lemmy.dbzer0.com -2 points 2 days ago

I like it to generate my git comments.
Sometimes I just don't know how to actually describe what I did.