Interesting, everyone suggesting 7B models, but you can run much better models using not only your GPU memory, so I would highly recommend mxlewd-l2-20b its very smart, its fantastic for writing scenes and such.
this post was submitted on 17 Nov 2023
1 points (100.0% liked)
LocalLLaMA
4 readers
4 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 2 years ago
MODERATORS
At 20 words per minute... Oh the joys of CPU interference
I personally like and use echidna-tiefigther-25. There's also another good one which is Openhermes-2.5-Mistral.
If you want speed, you'll want to use Mistral-7B-OpenOrca-GPTQ with ExLLama v2, that'll give you around 40-45 tokens per second. TheBloke/Xwin-MLewd-13B-v0.2-GGUF to trade speed for quality (llama.cpp)