I'm a ML scientist with a modest 18 h-index and I have used both Mac and Nvidia GPUs. Currently I have a M1 Max and a workstation with 6000 Ada Lovelace (the professional version of 4090) + Ryzen 9 5950X. so the first year was really tough for apple silicon but now pytorch and tensorflow both work very well on apple silicon. My lab also has A100 servers. Most of my time nowadays I'm using pytorch with mps (metal performance shader) on Mac and have almost no issues with it. From my own testing with spatial-temporal series and vision, M1 Max was about half as fast as 6000 Ada (which is insane because M1 Max GPU consumes 35 Watt). We also have M1 Pro with 16GB RAM, which could be interesting for comparison for the model you are looking at. They run the same code about half as slow as M1 Max, it seems it scales with the GPU cores or GPU render capability. I'm also looking to upgrade to M3 Max and it seems the GPU metal performance sometimes double M1 Max or matches M2 Ultra, would be very interesting to see if it could match Ada in ML.
Also, we used apples' powermetrics tool to check the power usage, GPU is used for model training, the apple neural engine is at 0 watt during training. I think the ANE is only used for inference for production level apps like Adobe's or Da'vinci's AI functions.
RAM: For the same GPU accessible RAM, Mac is more cost-efficient than professional Nvidia cards, and Mac goes way higher than what Nvidia cards can touch. My 2 year old M1 Max has 64GB and at the time of purchase there was no PCIe cards come even close to it, the closest is A6000 Ampere with 48GB VRAM and the card alone costs more than the Macbook. Keep in mind the os and dataloader that runs on CPU might take 5-8 GB, if you don't have other RAM consuming program running.
Pytorch installation on Macbook is simply one line of code and you don't have to deal with cudnn / cuda versions, Apple took care of everything about mps. The bliss of it just works.
Sometimes I like to run the same code both on cuda and mps, I can recount some minor problems I've recently encountered compared to cuda:
if you are trying to duplicate someone else's code and they happen to have hard-coded cuda-specific codes, you'll have to spend sometime clearing them out. But anything cuda-codes there should be a framework level (tf or torch) api that is generic to hardware accelerators.
sometimes the same code, mps runs into weird large loss numbers that I just can't duplicate on cuda, sometimes the loss oscillate along the epochs more on mps. very rarely mps would have nan loss but it happens like once and then the same code would not reproduce it. But yeah I'm working on very deep level ML that we often customize our own loss functions and write our own training loops. I'm guessing it might be cause that currently mps is only float32 and cuda defaults with float64.
Not related to mps/cuda but important for general datascience, if you want to zip/unzip a large dataset with hundreds of thousands of small files, windows PC (with a 16-core Ryzen 9!) takes >10 times the time than my Macbook on battery. I do not understand but it is true.
But I cannot say about more modern LLMs, like meta Llama the small version seems to take 13GB for the model alone.
I'm not trying to sell a mac. I'd say with the performance I hinted at, you can consider a PC system at your budget, and see which GPU it will get you, then look for performance scalers from that GPU to 6000 Ada. I'd consider the M3 Pro 16-core GPU's render performance relative to M1 Max - it seems we don't have a full picture yet but I've seen numbers in the range of 70% to 80%? Depending on the benchmark tasks. So if that PC system's GPU is more than 35% of 6000 Ada, it could outperform the M3 Pro you are looking at considering only the training time.
But a Macbook in other regards is like years ahead of other PCs (I am a PC gamer too and this is not a biased opinion). During model training with full GPU utilization on the 16 inch the computer is quieter than normal windows laptops on standby - I hear my own breath before I can hear the fan from an arm's distance. The display is gorgeous to look at. The battery lasts forever. It will not auto-update during your unattended overnight training. But again, you don't have many games on Mac (yet).
I'm a ML scientist with a modest 18 h-index and I have used both Mac and Nvidia GPUs. Currently I have a M1 Max and a workstation with 6000 Ada Lovelace (the professional version of 4090) + Ryzen 9 5950X. so the first year was really tough for apple silicon but now pytorch and tensorflow both work very well on apple silicon. My lab also has A100 servers. Most of my time nowadays I'm using pytorch with mps (metal performance shader) on Mac and have almost no issues with it. From my own testing with spatial-temporal series and vision, M1 Max was about half as fast as 6000 Ada (which is insane because M1 Max GPU consumes 35 Watt). We also have M1 Pro with 16GB RAM, which could be interesting for comparison for the model you are looking at. They run the same code about half as slow as M1 Max, it seems it scales with the GPU cores or GPU render capability. I'm also looking to upgrade to M3 Max and it seems the GPU metal performance sometimes double M1 Max or matches M2 Ultra, would be very interesting to see if it could match Ada in ML.
Also, we used apples' powermetrics tool to check the power usage, GPU is used for model training, the apple neural engine is at 0 watt during training. I think the ANE is only used for inference for production level apps like Adobe's or Da'vinci's AI functions.
RAM: For the same GPU accessible RAM, Mac is more cost-efficient than professional Nvidia cards, and Mac goes way higher than what Nvidia cards can touch. My 2 year old M1 Max has 64GB and at the time of purchase there was no PCIe cards come even close to it, the closest is A6000 Ampere with 48GB VRAM and the card alone costs more than the Macbook. Keep in mind the os and dataloader that runs on CPU might take 5-8 GB, if you don't have other RAM consuming program running.
Pytorch installation on Macbook is simply one line of code and you don't have to deal with cudnn / cuda versions, Apple took care of everything about mps. The bliss of it just works.
Sometimes I like to run the same code both on cuda and mps, I can recount some minor problems I've recently encountered compared to cuda:
My field is not LLM but for the basics like Bert according to this post checking how much memory it takes on cuda, a 36GB Mac shouldn't be a problem https://huggingface.co/docs/transformers/v4.18.0/en/performance
But I cannot say about more modern LLMs, like meta Llama the small version seems to take 13GB for the model alone.
I'm not trying to sell a mac. I'd say with the performance I hinted at, you can consider a PC system at your budget, and see which GPU it will get you, then look for performance scalers from that GPU to 6000 Ada. I'd consider the M3 Pro 16-core GPU's render performance relative to M1 Max - it seems we don't have a full picture yet but I've seen numbers in the range of 70% to 80%? Depending on the benchmark tasks. So if that PC system's GPU is more than 35% of 6000 Ada, it could outperform the M3 Pro you are looking at considering only the training time.
But a Macbook in other regards is like years ahead of other PCs (I am a PC gamer too and this is not a biased opinion). During model training with full GPU utilization on the 16 inch the computer is quieter than normal windows laptops on standby - I hear my own breath before I can hear the fan from an arm's distance. The display is gorgeous to look at. The battery lasts forever. It will not auto-update during your unattended overnight training. But again, you don't have many games on Mac (yet).
Hope this helps.