this post was submitted on 17 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Wow, you have one of the rare 2060 12 GB models. My best guess would be GGUF version, Try Q4 with maybe 25 layers offset in GPU. Make sure to close any apps, as you are gonna be really close to running out of RAM.
The Exllama2 4BPW (or kinda Q4 equivalent) model requires around 23 GB of VRAM as an reference point.
Hmm, I might consider switching out or 2 sticks of 32. That should make things easier. I usually need to be using about 16 at all times for other things so