I see it being ~2GB per every 4k from what llama.cpp spits out. Load a model and read what it puts in the log.
As to mac vs RTX. You can build a system with the same or similar amount of vram as the mac for a lower price but it depends on your skill level and electricity/space requirements.
If you live in a studio apartment, I don't recommend buying an 8 card inference server, regardless of the couple $1000 in either direction and the faster speed.