literal_garbage_man

joined 1 year ago
[–] literal_garbage_man@alien.top 1 points 11 months ago

Yeah running a 4U case and assembling it with “plain desktop” hardware but rack mounted and headless is definitely an option too. I might be asking too much of server hardware to take R730s (or any racked datacenter hardware) and fit them to a role they weren’t designed for. These are good thoughts and useful links, thank you.

 

Are you self-hosting LLMs (AI models) on your headless servers? I’d like to hear about your hardware setup. What server do you have your GPUs in?

When I do a hardware refresh I’d like to ensure my next server can support GPU(s?) for local LLM inferencing. I figured I could put in either a 4090 or x2 3090’s(?) maybe into an R730. But I’ve only barely started to research this. Maybe it isn’t practical.

I don’t know much other hardware lineups besides the Dell R7xx lineup.

I host oobagooba on an R710 as a model server API, and host sillytavern and stable diffusion which use oobagooba as clients. I use an R710 using a CPU, so as you can imagine inferencing is so slow it’s basically unusable. But I wired it up as a proof of concept.

I’m curious what other people who self-host LLMs do. I’m aware of remote options like Mancer or Runpod. I’d like the option for purely local inferencing.

Thanks all

Will try this out on runpod. Thanks for the heads up

[–] literal_garbage_man@alien.top 1 points 1 year ago (4 children)

Need ways of distributing LLMs besides hugging face