LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

I’m extremely confused about system requirements. Some people are worried about ram and others about vram. I have 64gb of ram and 12gb vram. What size of model can I run? (alien.top)

submitted 11 months ago by A0sanitycomp@alien.top to c/localllama@poweruser.forum

14 comments fedilink hide all child comments

From what I’ve read mac somehow uses system ram and windows uses the gpu? It doesn’t make any sense to me. Any help appreciated.

you are viewing a single comment's thread
view the rest of the comments

[–] artelligence_consult@alien.top 1 points 11 months ago

> From what I’ve read mac somehow uses system ram and windows uses the gpu?

Learn reading.

SOME MAC's - some very specific models - do not have a GPU in the classical sense but an on chip GPU and super fast RAM. You could essentially say they are a graphics card with CPU functionality and only VRAM - that would come close to the technical implementation side.

They are not using "sometimes this, sometimes that"; just SOME models (M1, M2, M3 chips) have basically a GPU that also is the CPU. The negative? Not expandable.

Normal RAM is a LOT - seriously a lot - slower than the VRAM or HBM you find on high end cards. They are not only way faster (DDR6 now, while current computers are now DDR5) but also not 654 bit wide but 384 or WAY higher (2048 i think) so their transfer speed in GB/S makes normal computers puke.

That, though, comes at a price. Which, a I pointed out, starts with being non flexible - no way to expand RAM, it is all soldered on. Some of the fast RAM is reserved - but essentially on a M2 Pro you get a LOT of RAM usable for LLM.

You now say you have 64gb ram - unless you bought a crappy card to a modern computer, that means you also have RAM that is way slower than what is normal today. So, you likely are tuck to the 12gb VRAM to run fast. Models come in layers, and you can offload some to the normal RAM, but it is a LOT slower than the VRAM - so not good to use a lot of it.