LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Hardware question: combining a 3090 and a p40 (alien.top)

submitted 2 years ago by Noxusequal@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

As the title says when combining a p40 and a rtx 3090 a few use casese come to mind and i wanted to know if they could be done ? greatly appreciate your help:
first could you run larger modells where they are computed on the 3090 and the p40 is just used for vram offloading and would that be faster then system memory ?

Could you compute on both of them in a asymetric fashion like putting some layers on the RTX3090 and fewer on the p40 ?

Lastly and that one probably works you could run two different instances of LLms for example a bigger one on the 3090 and a smaller on the p40 i asume.

you are viewing a single comment's thread
view the rest of the comments

[–] Hoppss@alien.top 1 points 2 years ago

This is not true, I have split two separate LLM models partially across a 4090 and a 3080 and have had them both run inference at the same time.

This can be done in oobabooga's repo with just a little tinkering.