this post was submitted on 18 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You might as well use the cards if you have them already. I'm currently getting around 5-6 tokens per second when running nous-capybara 34b q4_k_m on a 2080ti 22gb and a p102 10gb (basically a semi lobotomized 1080ti). The p102 does bottleneck the 2080ti, but hey, at least it runs at a near usable speed! If I try running on CPU (I have a r9 3900) I get something closer to 1 token per second.
How did you get your 2080 ti to 22gb of VRAM?
Modded cards are quite easy to obtain in china