these are my suggestions.
https://huggingface.co/TheBloke/cat-v1.0-13B-GPTQ
https://huggingface.co/TheBloke/Augmental-Unholy-13B-GPTQ
https://huggingface.co/TheBloke/HornyEchidna-13B-v0.1-GPTQ
and the one i keep coming back too but can barely run.
Community to discuss about Llama, the family of large language models created by Meta AI.
these are my suggestions.
https://huggingface.co/TheBloke/cat-v1.0-13B-GPTQ
https://huggingface.co/TheBloke/Augmental-Unholy-13B-GPTQ
https://huggingface.co/TheBloke/HornyEchidna-13B-v0.1-GPTQ
and the one i keep coming back too but can barely run.
Heres a link to a up to date ranking of models for RP. Currently 400+ models ranked.
What about Cat 13b 1.0? It slipped through here without much attention but it looks really good, with 16gb you could run q8
with 16gb you could run q8
Not really though. Any kind of context will push you over 16gb. Or I'm doing something wrong.
GGUF? Even on gtx 1080 you get like 4t/s with q8 which is almost as fast as average person read speed, with 16gb it should be 4-5x faster
Hadn't thought of that. I have 24gb so I've always used GPTQ and with that, you really need more than 16gb.
Chat/RP is one of my main use cases so I test for that specifically - check out my latest LLM Comparison/Test which includes links to my previous tests.
I really liked Echidna-Tiefighter. Characters act way more natural than with any other 13B model I tried.