LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

What is the best current uncensored Storytelling LLM that can run with 32gb system ram and 8 gb Vram PC? (alien.top)

submitted 2 years ago by Acrobatic_Internal_2@alien.top to c/localllama@poweruser.forum

4 comments fedilink hide all child comments

Ok, I know this may be asked here a lot, but the last time I checked this sub was around the time that LLaMa.Cpp just came out and I assume a lot has changed/Improved I hear models like Mistral can even change the landscape, what is currently best roleplay and storytelling LLM that can run on my PC with 32 GB Ram and 8gb Vram card (Or both since I also heard about layered hybrid approach too) and generally what would you recommend with this specs?

Thanks for this amazing community in advance for improving open source LLM eco-system

top 4 comments

sorted by: hot top controversial new old

[–] uti24@alien.top 1 points 2 years ago (1 children)

Interesting, everyone suggesting 7B models, but you can run much better models using not only your GPU memory, so I would highly recommend mxlewd-l2-20b its very smart, its fantastic for writing scenes and such.

[–] Kevinswelt@alien.top 1 points 2 years ago

At 20 words per minute... Oh the joys of CPU interference

[–] IXAbdullahXI@alien.top 1 points 2 years ago

I personally like and use echidna-tiefigther-25. There's also another good one which is Openhermes-2.5-Mistral.

[–] zware@alien.top 1 points 2 years ago

If you want speed, you'll want to use Mistral-7B-OpenOrca-GPTQ with ExLLama v2, that'll give you around 40-45 tokens per second. TheBloke/Xwin-MLewd-13B-v0.2-GGUF to trade speed for quality (llama.cpp)