I run 2x P40s with 70b chat and 8k ctx I get 7-8 T/s and I'm very happy with that. Anything above 5 is awesome for me.
DrVonSinistro
joined 1 year ago
I run 2x P40s with 70b chat and 8k ctx I get 7-8 T/s and I'm very happy with that. Anything above 5 is awesome for me.
Because a model can be divine or crap with some settings, I think its important I specify that I use:
Deepseek 33b q8 gguf with the Min-p setting (I love it very much)
Source of my Min-p settings: (1) Your settings are (probably) hurting your model - Why sampler settings matter : LocalLLaMA (reddit.com)