LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

How important is pci-e speed/gen/lanes when doing inference? (alien.top)

submitted 2 years ago by wh33t@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

I'm trying to assess whether or not to try and run a second GPU in my second full length slot. My mobo manual reports the fatest the second slot can go is pci-e 2.0 at x4 lanes. A paltry 2GB/s correct?

Can anyone comment from personal experience?

top 5 comments

sorted by: hot top controversial new old

[–] Aaaaaaaaaeeeee@alien.top 1 points 2 years ago

https://github.com/turboderp/exllama/discussions/16#discussioncomment-6245573

https://github.com/turboderp/exllama/issues/164#issuecomment-1641273348

[–] platinums99@alien.top 1 points 2 years ago

depends on the board, seek the manual, rtfm :D

and if turboderp is right, it largely doenst matter.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

For exllama not much, for others a bit. On llama.cpp I lose 10% by halving the bandwidth.

[–] NoWarrenty@alien.top 1 points 2 years ago (1 children)

It is very important if you care about performance. On inference, a lot of data has to go from one card to another. I was using 1x risers and it sucked. If you have two similar nvidia cards, you can get around by using nvlink bridge.

Otherwise you should aim at pcie 4 8x at least when looking for a Mainboard. I sniped a epyc system from ebay for 1000€ that has 6 pcie 4 16x and it rocks it all with 4 3090.

https://preview.redd.it/u0bvy2kkzw1c1.jpeg?width=4032&format=pjpg&auto=webp&s=ecb164bbf59504e590c19403554e24df8f9236c8

[–] Massive_Robot_Cactus@alien.top 1 points 2 years ago

Was the cpu from ebay too? Any reliability issues? It seems a lot of the cheap ones on ebay are gray market / production candidates.