overview for DominicanGreg

How to upgrade to the next VRAM breakpoints, and is it worth it? in c/localllama@poweruser.forum

[–] DominicanGreg@alien.top 1 points 11 months ago

Parts wise, a threadripper + ASUS Pro WS WRX80E-SAGE SE WiFi II is already a 2k price floor.

each 4090 is 2-2.3k

each 3090 is 1-1.5k

so building a machine from scratch will run you easily 8-10k off 4090's and 6-8k off 3090's. If you already have some GPUS or parts you would still problaly need 2 or more extra gpu's plus the space and power to run them.

to my specific situation i would have to grab the treadripper, mobo, a case, ram, 2 more cards im looking at potentially 5-7k worth of damage. OR.... pay 8.6k for a mac pro m2 and get an entire extra machine to play with.

There's definitely an entire Mac Pro M3 series on the way considering they just released the laptops, it's only a matter of time for them to shoot out the announcements. So i would definitely feel a bit peeved if i bought the M2 tower only for a month or two later apple to release the m3 versions.

1

How to upgrade to the next VRAM breakpoints, and is it worth it? (alien.top)

submitted 11 months ago by DominicanGreg@alien.top to c/localllama@poweruser.forum

20 comments fedilink

Right now it seems we are once again on the cusp of another round of LLM size upgrades. It appears to me that having 24gb VRAM gets you access to a lot of really great models, but 48gb VRAM really opens the door towards the impressive 70B models and allows you to nicely run the 30B models. However, im seeing more and more 100B+ models being created that push the 48 gb VRAM specs down into lower quants if they are able to run the model at all.

this is in my opinion is big, because 48gb is currently the magic number for in my opinion consumer level cards, 2x 3090's or 2x 4090s. adding an extra 24gb to a build via consumer GPUs turns into a monumental task due to either space in the tower or capabilities of the hardware AND it would put you at 72gb VRAM putting you at the very edge of the recommended VRAM for the 120GB 4KM models.

I genuinely don't know what i am talking about and i am just rambling, because i am trying to wrap my head around HOW to upgrade my vram to load the larger models without buying a massively overpriced workstation card. should i stuff 4 3090's into a large tower? settle up 3 4090's in a rig?

how can the average hobbyist make the jump from 48gb to 72gb+?

is taking the wait and see approach towards nvidia dropping new scalper priced high VRAM cards feasible? Hope and pray for some kind of technical magic that drops the required VRAM while simultaneously keeping quality?

the reason i am stressing about this and asking for advice is because the quality difference between smaller models and 70B models is astronomical. and the difference between the 70B models and the 100+B models is a HUGE jump too. from my testing it seems that the 100B+ models really turn the "humanization" of the LLM up to the next level, leaving the 70B models to sound like...well.. AI.

I am very curious to see where this gets to by the end of 2024, but for sure.... i won't be seeing it on a 48gb VRAM set up.

1

Humanize Prompt Responses? (alien.top)

submitted 11 months ago by DominicanGreg@alien.top to c/localllama@poweruser.forum

2 comments fedilink

Playing around with LZLV-70b 4QM, i am having a great time with the long form responses. However after a while now i am beginning to notice "AI styled writing" I tried pumping up the temperature to 1.5, repetition penalty to 1.3 and even tried mirostat mode 1,2 on the kobold.cpp

If the repetition penalty gets too high, the AI gets nonsensical. If the temperature gets too high, the AI starts to blurt out nonsense. While I am not aiming to have the AI write exactly like a person and be undetectable, after reading it so much it's gotten pretty repetitive with it's word usage and sentences.

I've tried prompting it to

" Write in a simple easy to read way make your response more conversational and less formal. Aim to increase perplexity by varying your word choices and avoiding predictable phrases "

but it's.....limited.

any tips or methods to make the AI not be so....dry?

Question about the 'economics' of running a LLM locally? in c/localllama@poweruser.forum

[–] DominicanGreg@alien.top 1 points 1 year ago

shit, tell me about it. i transitioned here from gaming.. i already had a 4090 24gb and was pretty happy with it until decent 70b models came out. then i had to splurge and picked between a 2nd 4090 or a 3090. I went with the 3090 because it's still just 24gb of vram and the 4090s are a bit fat.

Well turns out i needed to upgrade my as my measly 1k PSU was choking hard. so i upgraded to a 2k psu just to have that extra wiggle room. I quickly ran out of space as well from hoarding data, so i picked up a spare NVME stick. then i learned that my RAM was too low and somehow running large models (70b) requires more ram available, i didn't even know RAM was necessary for running LLMs if you were using GPUs so i filled up my ram slots.

so all in i spent around 4-5k, the 4090 original build being the bulk of it, but the upgrades with the 3090, ram, psu and nvme wasn't cheap.

and now....i keep reading about the 120b goliath model, that's getting rave reviews. and it's out of reach for me with my 48gb Vram, i9, and 96gb ram. i can't get it to run on ooga, can't get it to run on kobold. And im getting real tempted by the new mac products that just came out namely the m3 max versions that have 128gb of unified memory, hell maybe even a mac studio for 5k with 196gb unified memory. would even look at the mac pro tower but thats even more expensive.

OR i could buy 2 more 3090's a thread ripper, and CPU for it. and squeeze it all into my tower.

either way it's very expensive to run it locally. I used to think i was at the peak running 70B models, but now the 120b models are starting to show up and i don't know how to move forward.

For roleplay purposes, Goliath-120b is absolutely thrilling me in c/localllama@poweruser.forum

[–] DominicanGreg@alien.top 1 points 1 year ago (1 children)

So does this fit in 48gb vram or nah?