this post was submitted on 17 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Airboros-Yi 34B model seems to be the best one now, even over 70B.
It is creative, and quire diverse in stories.
How do I run this on my GPU and CPU? I have an rtx2060. It has 12 gb of VRAM and I also have 32 gb of RAM available. Is this enough to run this?
Wow, you have one of the rare 2060 12 GB models. My best guess would be GGUF version, Try Q4 with maybe 25 layers offset in GPU. Make sure to close any apps, as you are gonna be really close to running out of RAM.
The Exllama2 4BPW (or kinda Q4 equivalent) model requires around 23 GB of VRAM as an reference point.
Hmm, I might consider switching out or 2 sticks of 32. That should make things easier. I usually need to be using about 16 at all times for other things so
Link? Not finding it.