this post was submitted on 04 Dec 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Right now it seems we are once again on the cusp of another round of LLM size upgrades. It appears to me that having 24gb VRAM gets you access to a lot of really great models, but 48gb VRAM really opens the door towards the impressive 70B models and allows you to nicely run the 30B models. However, im seeing more and more 100B+ models being created that push the 48 gb VRAM specs down into lower quants if they are able to run the model at all.

this is in my opinion is big, because 48gb is currently the magic number for in my opinion consumer level cards, 2x 3090's or 2x 4090s. adding an extra 24gb to a build via consumer GPUs turns into a monumental task due to either space in the tower or capabilities of the hardware AND it would put you at 72gb VRAM putting you at the very edge of the recommended VRAM for the 120GB 4KM models.

I genuinely don't know what i am talking about and i am just rambling, because i am trying to wrap my head around HOW to upgrade my vram to load the larger models without buying a massively overpriced workstation card. should i stuff 4 3090's into a large tower? settle up 3 4090's in a rig?

how can the average hobbyist make the jump from 48gb to 72gb+?

is taking the wait and see approach towards nvidia dropping new scalper priced high VRAM cards feasible? Hope and pray for some kind of technical magic that drops the required VRAM while simultaneously keeping quality?

the reason i am stressing about this and asking for advice is because the quality difference between smaller models and 70B models is astronomical. and the difference between the 70B models and the 100+B models is a HUGE jump too. from my testing it seems that the 100B+ models really turn the "humanization" of the LLM up to the next level, leaving the 70B models to sound like...well.. AI.

โ€‹

I am very curious to see where this gets to by the end of 2024, but for sure.... i won't be seeing it on a 48gb VRAM set up.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] MindOrbits@alien.top 1 points 11 months ago

Yes.

Workstations are the way to go. There are a few motherboards out there that give you four 2x wide slots.

Pro tip: think in pcix3 terms, 16x lanes (pcix3) is a sought after baseline. 8x lanes often preform about 80% of 16x, often due other system limitations are the bottlenecks, not the pci bus.

Depending on cpu, motherboard chipset, and internal lane connections, you will struggle to find four 16x slots.

PCI 4.0 adds to the mess, but always in your benefit, just not as much as you might think Depending on the above.

Older cards 3.0 Most cards you consider modern good and better 4.0 New cards 5.0

4.0 lanes can be split by chipsets for things like nvme drives and usb. And is 2x the bandwidth of pcix3 with supported 4.0 devices. (8x pcix4 ~ 16x pcix3) A nice motherboard feature is when 16x pice4 lanes are split into two 16x pciex3 slots. Chipsets and nvme drives benefit greatly from pcix4 and often free up more pcix3 lanes for the slots.

So... if you find four pci double wide slots with at least 8x lanes per slot your leaving some performance 'on the table' but your really not that handicapped by the loss for what you buying, especially when shopping used.

Really new cards would suffer more from lane saturation, and may not have a favorable cost to benefit due to newer cards price.