2080ti, getting 15-30% usage per question , it has been higher on other models but the latest openhermes mistral seems effecient.
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
A lot of people are trying to run LLM sexbots and don't want to rent hardware on runpod.
I will say the economics around DIY for TRAINING can make quite a bit of sense if you live near cheap electricity. Some people aren't even paying for utilities at all.
As someone who spends a lot of time on a chair in front of a PC, both as a hobby and for work, I treated myself to an early Christmas present with a dual 3090 machine.
I used to game a lot, but those days are over. It's still nice to be able to play the latest games on maximum graphics, but it's also great to have the capability to play around with the big boy LLMs out there.
Right now, I'm experimenting with so much stuff, trying different frameworks like autogen and memgpt. I tinker around without having this nagging thought in the back of my mind saying,
'Man, you're wasting money,' or 'Be more efficient,'
and so on, if you know what I mean. If it were just for the sake of trying LLMs, then definitely not. I would stick to cloud solutions.
Even if you assumed 10h / wk of usage
It's bold of you to assume :P
But seriously, it doesn't make any sense. The same way as it doesn't make sense to buy $1200 guitars, $5000 cameras or anything else people do as a hobby.
Buying used stuff, I paid around 3k for 2 3090s, 3 P40s and a server. I can run 70b, 180b, 120b, whatever as long as it's quantized. I can stable diffusion and 70b at the same time. Have TTS, STT, if I had better internet, serve people, etc.
Besides the P40s, I can re-sell the server and the 3090s for more than I paid most likely. I could also BE the person you rent from on runpod. People spend this kind of money on eating out and subscription services and don't even think twice.
Growing your own vegetables doesn't make economic sense either. If you spent half the time it took to till, plant, water, weed and harvest working a regular job, you could buy at least twice as many vegetables. Worse if you have to buy or rent the land too. And yet, many people do it.
Humans often don't do the cost effective things, preferring a DIY approach because it gives a sense of achievement and independence. It's an attitude more prevalent in the US than in Europe but I prefer it over the "I own nothing, have no privacy, and life has never been better " Schwabian dystopia.
If you take a really sober look at the numbers, how does running your own system make sense over renting hardware at runpod or a similar service?
To me it doesn't. I use runpod, I'm just on this sub because it's the best place I know to keep up on the latest news in open source / self-hosted LLM stuff. I'm not literally running it "locally."
As far as I can tell there are lots of others like me here on this sub. Of course also many people here run on their own hardware, but it seems to me like the user base here is pretty split. I wonder what a poll would find.
I've used rundpod in the past but got a bit frustrated with it when I couldn't have just a desktop to run whatever tools I wanted in the same box. I shifted to using VMs rather than runpod which has been nice switching between a text generation ui, lm studio, etc. on the same rented box.
shit, tell me about it. i transitioned here from gaming.. i already had a 4090 24gb and was pretty happy with it until decent 70b models came out. then i had to splurge and picked between a 2nd 4090 or a 3090. I went with the 3090 because it's still just 24gb of vram and the 4090s are a bit fat.
Well turns out i needed to upgrade my as my measly 1k PSU was choking hard. so i upgraded to a 2k psu just to have that extra wiggle room. I quickly ran out of space as well from hoarding data, so i picked up a spare NVME stick. then i learned that my RAM was too low and somehow running large models (70b) requires more ram available, i didn't even know RAM was necessary for running LLMs if you were using GPUs so i filled up my ram slots.
so all in i spent around 4-5k, the 4090 original build being the bulk of it, but the upgrades with the 3090, ram, psu and nvme wasn't cheap.
and now....i keep reading about the 120b goliath model, that's getting rave reviews. and it's out of reach for me with my 48gb Vram, i9, and 96gb ram. i can't get it to run on ooga, can't get it to run on kobold. And im getting real tempted by the new mac products that just came out namely the m3 max versions that have 128gb of unified memory, hell maybe even a mac studio for 5k with 196gb unified memory. would even look at the mac pro tower but thats even more expensive.
OR i could buy 2 more 3090's a thread ripper, and CPU for it. and squeeze it all into my tower.
either way it's very expensive to run it locally. I used to think i was at the peak running 70B models, but now the 120b models are starting to show up and i don't know how to move forward.
Is Brave running a LLM locally?
This is the cost of freedom I assume.
I still remember how OpenAI censored commenting on Russian - Ukraine war on just battle force strength comparison, and when I changed the subject to hypothetical war in USA, he happily comments USA will split into 4 countries.
Not to mention the political correctness shit when I just asking some dressing idea for little girls going to birthday party.
We never knew when or where this Corps will use AI to convey agendas rather than facts. That's where local LLM kicks in.
As with all things….what are you actually trying to accomplish? That will drive the right answer for you.
I think it really is going to be situation dependent. I think one major advantage to having your own local system is you won't be subject to the whims of corporations or self proclaimed "AI safety experts" who want to limit what you can do.
It’s always better to rent space on a provider cloud. Less hassle , no maintenance and better GPUs.
I wrote abut this on the other thread.
Everyone forgets the thing that the value of hardware you buy doesn't automatically drop to zero the moment you plug-in your computer.
But most people will kind of pretend that is the case. You buy computer for $4000 and that gives you xxx months on runpod. Yeah, but at the end of the xxx months using your computer you still have a computer which value is > $0. At the end of xxx months on runpod you have nothing.
In case you buy used GPU, the chance is that after 3 months that GPU will still be same money used. So you spent 0, except electricity.
There are some factors that indirectly play into economics.
AKA hiding your trade secrets.
if i just want to try big models, renting is more reasonable. but if i plan to use it daily and for the longer period of time, making investment to buy the GPU is make a lot of sense. not only you could use the GPU for gaming or another tasks, you also actually own the GPU which you could probably sell it later for minimum half the original price.
Part of it is emotional, too. I personally find it very distracting to have a remote computer system running with a per-hour bill, as I'm focusing on squeezing every penny out of the rental. When I have purchased the device up front it's much more relaxing. You let it run, or you don't, and it doesn't make a difference either way.
Even though I'm certainly spending more money purchasing the hardware up front, I'm enjoying it much more than I otherwise would.
A decent Ryzen with 64 gigs can be had for under $1000 easily. If you just want to play around, you don't even need a GPU at all. CPU is just fine.