LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Is it possible to run 4*A100 40G cards as one? (alien.top)

submitted 1 year ago by manjimin@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

Newbie question, but is there a way to have 4*A100 40G cards run as one, with 160G VRAM in total?

I am not able to load a 70B model even with 4bit quantization because my lab has 40G cards.

edit) If this is possible, can I run 8*3090 24G cards as one also?

you are viewing a single comment's thread
view the rest of the comments

[–] yamosin@alien.top 1 points 1 year ago

Newbie question, but is there a way to have 4*A100 40G cards run as one, with 160G VRAM in total?

Yes, that will work but lose some performance

edit) If this is possible, can I run 8*3090 24G cards as one also?

Yes and no, yes you can do this but no unless you need run a 176GB model, more gpu with a model only lose performance but not increase it

for example, if I run a 13b gptq 4bit model on 1*3090 it get 45t/s, if i run it on 2*3090 it will slow down to 30t/s but 3*3090 is same

and also, I dont have any idea why people always saying about nvlink, 2*pciex16 3090 give same speed compare with 2*pciex1, not sure if there will change for higher number but it just not help at all for 2*3090

so basically the real matter is what model you want to run, not what and how many GPU you can use, you need run your model with minimum number gpu to get best performance, if 3090 have 36Gvram but not only 24Gvram, 36G*2 will way more faster than 24G * 3 even the total Vram is same