Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
view the rest of the comments
The main thing that has stopped me from running models like this so far is VRAM. My server has a RTX 4060 with 8GB, and not sure that can reasonably run a model like this.
Edit:
This calculator seems pretty useful: https://apxml.com/tools/vram-calculator
According to this, I can run Qwen3 14B with 4B quant and 15-20% CPU/NVMe offloading and get 41 tokens / s. It seems 4B quant reduces accuracy by 5-15%.
The calculator even says I can run the flagship model with 100% NVMe offloading and get 4 tokens / s.
I didn’t realize NVMe offloading was even a thing and not sure if it actually is supported or works well in practice. If so, it’s a game changer.
Edit:
The llama.cpp docs do mention that models are memory mapped by default and loaded into memory as needed. Not sure if that means that a MoE model like qwen3 235b can run with 8GB of VRAM and 16GB of RAM, albeit at a speed that is an order of magnitude slower like the calculator suggests is possible.