So in the last few weeks i have been experimenting with LLMs on my personal laptop (as I'm rarely at home) but I'm gonna have my pc with me in a few days. When running models (MythoMax 13b, mostly Q6_K and Q5_K_M GGUF) I can definitely feel my laptop not liking it. Slowdowns, crashes, service terminations and timeouts.
Now, the situation is this, I have unexpectedly gotten some money which i want to invest in PC parts.
My PC currently has 16GB of DDR5 Ram and a GTX 1070 with 8GB VRAM.
The idea now is to buy a 96GB Ram Kit (2x48) and Frankenstein the whole pc together with an additional Nvidia Quadro P2200 (5GB Vram).
Would the whole "machine" suffice to run models like MythoMax 13b, Deepseek Coder 33b and CodeLlama 34b (all GGUF)
Specs after: 112GB DDR5, 8GB VRAM and 5GB VRAM, CPU is a Ryzen 5 7500F
And the question i should have asked first, can the GTX 1070 and P2200 setup even work, like would text gen webui even detect both cards?
Sorry if thats a dumb question
I would replace the DDR5 ram rather than add to it or your memory will run a lot slower and you just don't need it if you're going to use gpus for inferencing. Also, a P40 is probably money better spent with this config than the P2200.
Thing is, I have the P2200 sitting in my shelf rn from my dads old workstation, so I wouldn't have to buy it.
13gb does not make for much. Especially when part of it is used for graphics and all old pascal architecture.
By all means just put the card is and see where it gets you on 13b.