this post was submitted on 30 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Running multiple GPUs requires PCIe lanes. Consumer PCs have too few of those to even run 2x GPUs at full bandwidth (2x16).

Threadrippers are prohibitively expensive for many.

AMD have announced EPYC 8004 Siena in September. These low-power server CPUs start at 8 cores @ ~$400 and offer 96 lanes. The catch is that the clock is pretty low.

So, the question is: How bottlenecked are LLMs by CPU clock?

I.e., would it make much of a difference if you run 4x 3090s on the $2000+ Threadripper vs $400 Epyc 8004?

you are viewing a single comment's thread
view the rest of the comments
[–] Imaginary_Bench_7294@alien.top 1 points 9 months ago

So that really depends. You're talking about running a multi gpu setup. If all of your model is in the gpu, then your processor will not be a bottleneck at all. The clock speed of the PCIe bus is independent of the cpu cores, unless you're messing with overclocking. That's why they advertise PCIe 3.0, 4.0, 5.0, etc. The PCIe version dictates the bandwidth per lane.

That being said, multi gpu setups do introduce some overhead. If a model is split between GPUs, the PCIe interface becomes a modest bottleneck as they pass data back and forth. The greater the number of GPU's the model is split across, the greater the bottleneck.