Interesting, everyone suggesting 7B models, but you can run much better models using not only your GPU memory, so I would highly recommend mxlewd-l2-20b its very smart, its fantastic for writing scenes and such.
this post was submitted on 17 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
At 20 words per minute... Oh the joys of CPU interference
I personally like and use echidna-tiefigther-25. There's also another good one which is Openhermes-2.5-Mistral.
If you want speed, you'll want to use Mistral-7B-OpenOrca-GPTQ with ExLLama v2, that'll give you around 40-45 tokens per second. TheBloke/Xwin-MLewd-13B-v0.2-GGUF to trade speed for quality (llama.cpp)