Apple has further segmented the Apple silicon lineup with the M3.
With the M1 & M2 Max, all the GPU variants had the same memory bandwidth (400GB/s for the M2 Max). The top of the line M3 Max (16 CPU/ 40GPU cores) is still limited to 400GB/s max, but now the lower spec variants (14 CPU/30 GPU) are only 300GBs/max.
Inference is generally bound by memory bandwidth, so the M3 generation may not be much of an improvement. Apple claims improvements in the cache, but that may not mean much for inference. We'll know more once people have them in hand, which shouldn't take too long.