A big issue for CPU only setups is prompt processing. They're kind of OK for short chats, but if you give them full context the processing time is miserable. Nowhere close to 5 tok/sec.
There is one exception: the Xeon Max with HBM. It is not cheap.
So if you get a server, at least get a small GPU with it to offload prompt processing.