Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

Macbook Pro M3 for LLMs and Pytorch? [D] (alien.top)

submitted 2 years ago by nizego@alien.top to c/machinelearning@academy.garden

7 comments fedilink hide all child comments

My current PC laptop is soon ready to retire, having worked for seven years. As a replacement I'm considering the new Macbook Pros. It is mainly the battery time which makes me consider Apple. These are my requirements for the laptop:

great battery time
16" since I'm old and my eyes are degraded
dual external monitors
software engineering including running some local docker images

Then I have two ML requirements which I don't know if I could fulfill using a laptop:

good performance for working with local LLMs (30B and maybe larger)
good performance for ML stuff like Pytorch, stable baselines and sklearn

In order to fulfill the MUST items I think the following variant would meet the requirements:

Apple M3 Pro chip with 12‑core CPU, 18‑core GPU, 16‑core Neural Engine
36 GB memory
512 GB SSD
Price: $2899

Question: Do you think I could fulfill the ML requirements using a Macbook Pro M3? Which config would be smart to buy in such case?

Thankful for advice!

top 7 comments

sorted by: hot top controversial new old

[–] oo_viper_oo@alien.top 1 points 2 years ago (1 children)

Llama 2 and it's derivatives seem to run just fine on Apple Silicon: https://github.com/ggerganov/llama.cpp

[–] drivanova@alien.top 1 points 2 years ago

yes, but just highlighting ggml is cpp not PyTorch (which is what OP is asking about)

[–] Ok-Zookeepergame6084@alien.top 1 points 2 years ago (1 children)

From direct exp Mac m1 and m2 air and mini run 7b quantized models ok but barely, they prefer 3b like orca mini. So your goal for running a 30b quantized is realistic, but only for inference and I would use ollama or other model servers that run as api's and then run your interface client separately, strealit, chainl et al. From a devs standpoint I prefer mac's just bec the Linux core in macos. My experience with pc laptop nvidia gpu's for inference isn't great. If you want plug it and go setup I rec mb. Dealing with c compliers is a pia in windows which includes llama.cpp, solidity and various others. Training or fine tuning on any consumer level hw isn't practical to me unless it's a million param model.

[–] jjaicosmel68@alien.top 1 points 2 years ago

that went over my head! so basically i have an m3 max. and i want to run an LLM on this hunk of junk. How the hell do i even get to upload it on. i am just asking copilot. ill let yo uguys know of what actually performs.

[–] progressive-bias@alien.top 1 points 2 years ago (1 children)

I'm a ML scientist with a modest 18 h-index and I have used both Mac and Nvidia GPUs. Currently I have a M1 Max and a workstation with 6000 Ada Lovelace (the professional version of 4090) + Ryzen 9 5950X. so the first year was really tough for apple silicon but now pytorch and tensorflow both work very well on apple silicon. My lab also has A100 servers. Most of my time nowadays I'm using pytorch with mps (metal performance shader) on Mac and have almost no issues with it. From my own testing with spatial-temporal series and vision, M1 Max was about half as fast as 6000 Ada (which is insane because M1 Max GPU consumes 35 Watt). We also have M1 Pro with 16GB RAM, which could be interesting for comparison for the model you are looking at. They run the same code about half as slow as M1 Max, it seems it scales with the GPU cores or GPU render capability. I'm also looking to upgrade to M3 Max and it seems the GPU metal performance sometimes double M1 Max or matches M2 Ultra, would be very interesting to see if it could match Ada in ML.

Also, we used apples' powermetrics tool to check the power usage, GPU is used for model training, the apple neural engine is at 0 watt during training. I think the ANE is only used for inference for production level apps like Adobe's or Da'vinci's AI functions.

RAM: For the same GPU accessible RAM, Mac is more cost-efficient than professional Nvidia cards, and Mac goes way higher than what Nvidia cards can touch. My 2 year old M1 Max has 64GB and at the time of purchase there was no PCIe cards come even close to it, the closest is A6000 Ampere with 48GB VRAM and the card alone costs more than the Macbook. Keep in mind the os and dataloader that runs on CPU might take 5-8 GB, if you don't have other RAM consuming program running.

Pytorch installation on Macbook is simply one line of code and you don't have to deal with cudnn / cuda versions, Apple took care of everything about mps. The bliss of it just works.

Sometimes I like to run the same code both on cuda and mps, I can recount some minor problems I've recently encountered compared to cuda:

if you are trying to duplicate someone else's code and they happen to have hard-coded cuda-specific codes, you'll have to spend sometime clearing them out. But anything cuda-codes there should be a framework level (tf or torch) api that is generic to hardware accelerators.
sometimes the same code, mps runs into weird large loss numbers that I just can't duplicate on cuda, sometimes the loss oscillate along the epochs more on mps. very rarely mps would have nan loss but it happens like once and then the same code would not reproduce it. But yeah I'm working on very deep level ML that we often customize our own loss functions and write our own training loops. I'm guessing it might be cause that currently mps is only float32 and cuda defaults with float64.
Not related to mps/cuda but important for general datascience, if you want to zip/unzip a large dataset with hundreds of thousands of small files, windows PC (with a 16-core Ryzen 9!) takes >10 times the time than my Macbook on battery. I do not understand but it is true.

My field is not LLM but for the basics like Bert according to this post checking how much memory it takes on cuda, a 36GB Mac shouldn't be a problem https://huggingface.co/docs/transformers/v4.18.0/en/performance

But I cannot say about more modern LLMs, like meta Llama the small version seems to take 13GB for the model alone.

I'm not trying to sell a mac. I'd say with the performance I hinted at, you can consider a PC system at your budget, and see which GPU it will get you, then look for performance scalers from that GPU to 6000 Ada. I'd consider the M3 Pro 16-core GPU's render performance relative to M1 Max - it seems we don't have a full picture yet but I've seen numbers in the range of 70% to 80%? Depending on the benchmark tasks. So if that PC system's GPU is more than 35% of 6000 Ada, it could outperform the M3 Pro you are looking at considering only the training time.

But a Macbook in other regards is like years ahead of other PCs (I am a PC gamer too and this is not a biased opinion). During model training with full GPU utilization on the 16 inch the computer is quieter than normal windows laptops on standby - I hear my own breath before I can hear the fan from an arm's distance. The display is gorgeous to look at. The battery lasts forever. It will not auto-update during your unattended overnight training. But again, you don't have many games on Mac (yet).

Hope this helps.

[–] nizego@alien.top 1 points 2 years ago

scientist use day-to-day that doesn't run natively on apple silicon now.

Thanks for sharing your perspectives! One thing which makes me listen to the "fearmongering" of ARM, is this specific issue which has been open for long: https://github.com/DLR-RM/stable-baselines3/issues/914

That is only an example, but that is the library (in addition to LLMs) I use now :)

[–] Mysterious_Can_2399@alien.top 1 points 2 years ago

I just got MacBook M3 max, and really psyched to start running some training workloads. I am scoping what acceleration libraries (such as DeepSpeed) I can use with PyTorch model code on M3. Would really appreciate if someone can suggest it.