LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

How bottlenecked are LLMs by CPU clock? (Budget options to host multiple GPUs) (alien.top)

submitted 11 months ago by Infinite100p@alien.top to c/localllama@poweruser.forum

7 comments fedilink hide all child comments

Running multiple GPUs requires PCIe lanes. Consumer PCs have too few of those to even run 2x GPUs at full bandwidth (2x16).

Threadrippers are prohibitively expensive for many.

AMD have announced EPYC 8004 Siena in September. These low-power server CPUs start at 8 cores @ ~$400 and offer 96 lanes. The catch is that the clock is pretty low.

So, the question is: How bottlenecked are LLMs by CPU clock?

I.e., would it make much of a difference if you run 4x 3090s on the $2000+ Threadripper vs $400 Epyc 8004?

you are viewing a single comment's thread
view the rest of the comments

[–] Imaginary_Bench_7294@alien.top 1 points 11 months ago

So that really depends. You're talking about running a multi gpu setup. If all of your model is in the gpu, then your processor will not be a bottleneck at all. The clock speed of the PCIe bus is independent of the cpu cores, unless you're messing with overclocking. That's why they advertise PCIe 3.0, 4.0, 5.0, etc. The PCIe version dictates the bandwidth per lane.

That being said, multi gpu setups do introduce some overhead. If a model is split between GPUs, the PCIe interface becomes a modest bottleneck as they pass data back and forth. The greater the number of GPU's the model is split across, the greater the bottleneck.