LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

M1/M2/M3: increase VRAM allocation with sudo sysctl iogpu.wired_limit_mb=12345 (i.e. amount in mb to allocate) (alien.top)

submitted 2 years ago by farkinga@alien.top to c/localllama@poweruser.forum

17 comments fedilink hide all child comments

If you're using Metal to run your llms, you may have noticed the amount of VRAM available is around 60%-70% of the total RAM - despite Apple's unique architecture for sharing the same high-speed RAM between CPU and GPU.

It turns out this VRAM allocation can be controlled at runtime using sudo sysctl iogpu.wired_limit_mb=12345

See here: https://github.com/ggerganov/llama.cpp/discussions/2182#discussioncomment-7698315

Previously, it was believed this could only be done with a kernel patch - and that required disabling a macos security feature ... And tbh that wasn't that great.

Will this make your system less stable? Probably. The OS will need some RAM - and if you allocate 100% to VRAM, I predict you'll encounter a hard lockup, spinning Beachball, or just a system reset. So be careful to not get carried away. Even so, many will be able to get a few more gigs this way, enabling a slightly larger quant, longer context, or maybe even the next level up in parameter size. Enjoy!

you are viewing a single comment's thread
view the rest of the comments

[–] bebopkim1372@alien.top 1 points 2 years ago

Maybe 47536MB is the net model size. For LLM inference, memory for context and optional context cache memory are also needed.