Homelab

371 readers

3 users here now

Rules

Be Civil.
Post about your homelab, discussion of your homelab, questions you may have, or general discussion about transition your skill from the homelab to the workplace.
No memes or potato images.
We love detailed homelab builds, especially network diagrams!
Report any posts that you feel should be brought to our attention.
Please no shitposting or blogspam.
No Referral Linking.
Keep piracy discussion off of this community

founded 1 year ago

MODERATORS

communick@selfhosted.forum

rglullis

Do you host LLMs? Rackable server with GPU options? (alien.top)

submitted 1 year ago by literal_garbage_man@alien.top to c/homelab@selfhosted.forum

3 comments fedilink hide all child comments

Are you self-hosting LLMs (AI models) on your headless servers? I’d like to hear about your hardware setup. What server do you have your GPUs in?

When I do a hardware refresh I’d like to ensure my next server can support GPU(s?) for local LLM inferencing. I figured I could put in either a 4090 or x2 3090’s(?) maybe into an R730. But I’ve only barely started to research this. Maybe it isn’t practical.

I don’t know much other hardware lineups besides the Dell R7xx lineup.

I host oobagooba on an R710 as a model server API, and host sillytavern and stable diffusion which use oobagooba as clients. I use an R710 using a CPU, so as you can imagine inferencing is so slow it’s basically unusable. But I wired it up as a proof of concept.

I’m curious what other people who self-host LLMs do. I’m aware of remote options like Mancer or Runpod. I’d like the option for purely local inferencing.

Thanks all

you are viewing a single comment's thread
view the rest of the comments

[–] PDXSonic@alien.top 1 points 1 year ago (1 children)

https://www.ebay.com/itm/364128788438?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=XkOKzd0RR_6&sssrc=4429486&ssuid=9jfKf00cSoK&var=&widget_ver=artemis&media=COPY

At least according to this fairly detailed eBay listing you might be limited in what GPUs you can run in an R730. It states a 300w max per card and double width, which would eliminate both the power and physical requirements of the 3090/4090. You could run say some Tesla P40s but they would be a bit slower.

Another option would to be just buy a rack mount 4U case and say a X99 motherboard (same era CPU as the R730) which would give you a bit more flexibility in running 3090/4090 cards so long as you had a 1200w or so PSU.

[–] literal_garbage_man@alien.top 1 points 1 year ago

Yeah running a 4U case and assembling it with “plain desktop” hardware but rack mounted and headless is definitely an option too. I might be asking too much of server hardware to take R730s (or any racked datacenter hardware) and fit them to a role they weren’t designed for. These are good thoughts and useful links, thank you.