Use the exllama v2 format with variable bitrate
Even a single 24GB GPU can support a 70b if it's quantized
For example, I haven't tried but I'm almost sure that 2.30b works on a single 24GB GPU: https://huggingface.co/turboderp/Llama2-70B-chat-exl2
I think you can even go higher than 2.30b
Use the exllama v2 format with variable bitrate
Even a single 24GB GPU can support a 70b if it's quantized
For example, I haven't tried but I'm almost sure that 2.30b works on a single 24GB GPU: https://huggingface.co/turboderp/Llama2-70B-chat-exl2
I think you can even go higher than 2.30b