LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Hardware Q's: Best model performance with 75+ 30 series GPU's? (alien.top)

submitted 2 years ago by divijulius@alien.top to c/localllama@poweruser.forum

9 comments fedilink hide all child comments

I really wish there was a site where you could plug in your hardware and see what t/s speed you could expect from it, so if anyone has a link like that, I'd be interested. I haven't been able to find one, and feel like I'm pretty much a noob when it comes to understanding what parts of hardware are important for local fine tuning and inference and running models, so please bear with me as I ask a bunch of probably dumb questions.

Broadly and in order, I think single GPU VRAM matters (more gb the better), then local RAM matters (same, but speed matters too I think?), then PCIE bus bandwidth speeds in gb/s matters, then additional GPU's matter (for 60% and 30% and decreasing speedups from there), and finally CPU and/or NVME space might matter a little. Does that sound broadly correct?

So the situation is I've got a ton of 30 series NVIDIA GPU's from a mining operation I wrapped up.

I could never sell them on r/hardwareswap or anywhere else, bc nobody would buy in bulk, and I'm sure the hell not wasting my time selling and shipping 75+ individual GPU's to whoever. I do have racks and mobos and power supplies and whatever too, but I don't think that matters. I also have a decent amount of 6800 and 6700xt and 5700xt AMD cards, but I don't think that matters either - please correct me if I'm wrong.

I'd like to use as many GPU's as possible for local fine tuning and inference, and am trying to figure out the best path for that. After reading about PCIE bandwidth and the speedups from 2 and 3 additional GPU's, I'm afraid the real answer is "sell some GPU's and buy an M2 Ultra Mac pro" or something like that, but if we couldn't do that route, what is the best path forward?

An EPYC server build with as many 3090's and 3080's as I can fit and either 96gb (2 sticks, full DDR5 speed) or 192gb (4 sticks, only DDR4 speed) of ram? Which ram config is better? I think the DDR5 vs DDR4 speed actually makes a difference, but am not sure how much of a difference.

Researching EPYC mobos, I think I can fit maybe 6 or 7 GPU's into an EPYC build, does that sound about right? Anyone know of any PCIE-rich mobo's or architectures that I could fit notably more GPU's than that into? I do have a bunch of mining mobo's, but don't think they're usable?

I'm pretty sure there's nothing possible like a beowulf cluster of mining boards + GPU's that you can use for model fine tuning / running, is that correct?

I also have a Threadripper linux box I could upgrade that can currently fit 4-6 GPU's, and could upgrade to an AM5 mobo and a 79503xd CPU pretty easily. I don't know how this stacks up against an EPYC build, does anyone have any ideas on that?

I looked up my current linux box mobo and the PCIE lanes only have 32gb/s bandwidth, so think a mobo upgrade to AM5 with 128gb/s would be necessary to get decent speeds, does that sound right?

Sorry for all the questions and my general lack of knowledge, any guidance or suggestions on maximizing a bunch of GPU's are very welcome.

you are viewing a single comment's thread
view the rest of the comments

[–] _Lee_B_@alien.top 1 points 2 years ago

One GPU per layer is an interesting approach ;)