Even if you get a 3090 with 24gb of vram, you're going to load the biggest model you can and realize it is useless for most tasks. Less than that and I don't even know what you would use it for.
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
Ok, you make a really good point. I’m starting out, you know a thing or two. How would you start out, knowing what you know now?
P40, 3090, those are your "affordable" 24gb GPU unless you want to go AMD or have enough to make 3x16gb or something.
I have a soft spot for AMD and I’m considering the R9 7900 or (lesser known) R9 PRO 7945 (it has AVX-512 support, which I understand is beneficial in the ML world) for the CPU, but heard AMD GPU support was lack in some workloads and I really don’t need that learning curve.
I can only afford and justify the space of one CPU so 3 x something is out of the question.
I looked at the P100, but read the 4060 was the better choice. I’m at a little bit of a loss. A 3u chassis looks like the way to go, and I have the option to water cool either the CPU or GPU.
I just got a P100 for like $150, going to test it out and see how it does with its FP16 vs P40 for SD and exllama overflow.
4060 is faster but its multiple times as expensive. For your sole GPU you really need 24gb+. The AMD are becoming somewhat competitive but still have some hassle and slowness.
CPU is going to give you 3t/s, its not really anywhere near, even with the best procs. Sure get it for other things in the system, but don't expect it to help much with ML. I guess newer will get you faster ram but it's not enough.
That depends, when you say you are building out a new server, are we talking a proper 1 or 2u dell, HPE, etc type server? If so you'll have to contend with the GPU footprint, for example my 1u servers can only take up to 2, half height, half length GPU's, and they can only be powered by PCIE so I'm limited to 75w.
In my 2u servers I can get the "GPU enablement kit" which is essentially smaller form factor heatsinks for the CPU's and some long 8pin power connectors to go from the the mobo to the PCIE riser, allowing many more options, but still there are problems to address with heat, power draw (CPUs are limited to 130TDP I believe) and the server firmware complaining about the GPU/forcing the system fans to run at an obnoxious level, etc...
If you are homebrewing a 3u, a tower or using consumer parts than things change quite a bit.
Home brewing is exactly the way I’m going. A 3u is on the table, I’m looking at the sliger chassis’s at the moment, but cooling a space is a big consideration.
To be honestly I had only considered 1 GPU, I live in the UK and electricity prices are high (and houses are small), so multiple CPUs just don’t seem appropriate.
You can cluster 3 16GB Arc A770 GPUs. That's 48GB and modern.
3 x anything is not an option. I’m 1 and done. 👍🏻
It just sucks because the sweet spot is 48GB but a single card is 3k usd at least.
At 1k you'll be stuck at 24GB for a single card.