overview for 0xd00d

How bottlenecked are LLMs by CPU clock? (Budget options to host multiple GPUs) in c/localllama@poweruser.forum

[–] 0xd00d@alien.top 1 points 2 years ago (1 children)

I would imagine that this new option you're talking about will be a good budget inference workhorse paired with multiple cards such as 3090s. 96 lanes of gen 5 will be a real enabler. That said, I think zen 2 epycs providing gen 4 lanes are cheaper still so there are good options available.

Dual 3090,24GB & 1070 worth it? in c/localllama@poweruser.forum

[–] 0xd00d@alien.top 1 points 2 years ago

Be sure to prioritize the 3090s pcie lanes

Is Upgrading from NVIDIA H100 to H200 Worth It? in c/localllama@poweruser.forum

[–] 0xd00d@alien.top 1 points 2 years ago

I suppose the real big thing factoring into scalability isn't necessarily CUDA, but TensorRT, which, yes is built on top of CUDA... I haven't been keeping up with the actual hardware capabilities in AMD's stuff wrt tensor cores, but basically what we're seeing is TensorRT is able to better utilize nvidia's tensor cores and extract much more out of the available memory bandwidth... if AMD can get close (it seems like we can only hope for them to get close), if they can produce significantly beefier hardware that sells for less, and the software can actually come close (this is the crux of it) then we may have some real competition