this post was submitted on 15 Nov 2023
2 points (75.0% liked)

Homelab

371 readers
3 users here now

Rules

founded 1 year ago
MODERATORS
 

Are you self-hosting LLMs (AI models) on your headless servers? I’d like to hear about your hardware setup. What server do you have your GPUs in?

When I do a hardware refresh I’d like to ensure my next server can support GPU(s?) for local LLM inferencing. I figured I could put in either a 4090 or x2 3090’s(?) maybe into an R730. But I’ve only barely started to research this. Maybe it isn’t practical.

I don’t know much other hardware lineups besides the Dell R7xx lineup.

I host oobagooba on an R710 as a model server API, and host sillytavern and stable diffusion which use oobagooba as clients. I use an R710 using a CPU, so as you can imagine inferencing is so slow it’s basically unusable. But I wired it up as a proof of concept.

I’m curious what other people who self-host LLMs do. I’m aware of remote options like Mancer or Runpod. I’d like the option for purely local inferencing.

Thanks all

you are viewing a single comment's thread
view the rest of the comments
[–] PDXSonic@alien.top 1 points 1 year ago (1 children)

https://www.ebay.com/itm/364128788438?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=XkOKzd0RR_6&sssrc=4429486&ssuid=9jfKf00cSoK&var=&widget_ver=artemis&media=COPY

At least according to this fairly detailed eBay listing you might be limited in what GPUs you can run in an R730. It states a 300w max per card and double width, which would eliminate both the power and physical requirements of the 3090/4090. You could run say some Tesla P40s but they would be a bit slower.

Another option would to be just buy a rack mount 4U case and say a X99 motherboard (same era CPU as the R730) which would give you a bit more flexibility in running 3090/4090 cards so long as you had a 1200w or so PSU.

[–] literal_garbage_man@alien.top 1 points 1 year ago

Yeah running a 4U case and assembling it with “plain desktop” hardware but rack mounted and headless is definitely an option too. I might be asking too much of server hardware to take R730s (or any racked datacenter hardware) and fit them to a role they weren’t designed for. These are good thoughts and useful links, thank you.