LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Picking out the next gpu's to buy, after using P40's for running LLama (alien.top)

submitted 2 years ago by CertainlyBright@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

I started with running quantized 70B on 6x P40 gpu's, but it's noticeable how slow the performance is. Sure maybe I'm not going to buy a few A100's to replace them. But what about an RTX8000 or two?

These will be going into a 2u gigabyte gpu server, so consumer cards don't fit

top 1 comments

sorted by: hot top controversial new old

[–] Tacx79@alien.top 1 points 2 years ago

From what I see rtx8000 is a bit slower than p40 in inference, a bit faster in training. The only speed up would be from running 2 cards instead of 6. Out of curiosity - what speeds did you have with p40s?