this post was submitted on 26 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I see it being ~2GB per every 4k from what llama.cpp spits out. Load a model and read what it puts in the log.
As to mac vs RTX. You can build a system with the same or similar amount of vram as the mac for a lower price but it depends on your skill level and electricity/space requirements.
If you live in a studio apartment, I don't recommend buying an 8 card inference server, regardless of the couple $1000 in either direction and the faster speed.
Thanks. Yes, a 2kW heater pc would only be welcome in the winter, and could get pricy to run.