Speed+quality = Nous-Capybara 34b. Offload 13 layers to system and get a Q5_K_M. If you have enough system RAM and a decent CPU you wont even feel it. Just quality, Euryale 1.3 70b. It will be slow - up to 200 seconds for a single message at Q5_K_M - but it will deliver.
this post was submitted on 16 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS