this post was submitted on 25 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I have the M3 Max with 128GB memory / 40 GPU cores.
You have to load a kernel extension to allocate more than 75% of the total SoC memory (128GB * 0.75 = 96GB) to the GPU. I increased it to 90% (115GB) and can run falcon-180b Q4_K_M at 2.5 tokens/s.
I ordered the same config. Would you mind telling me what you’ve loved using it for (AI/LLM-wise)? My current laptop can’t do anything, so haven’t been able to jump into this stuff, despite strong interest. It’d be helpful to have a jumping off point. TIA!
I run a code completion server that works like GitHub Copilot. I'm also working on an Mail labeling system using llamacpp and AppleScript, but it is very much a work-in-progress.