LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

What is considered the best uncensored LLM right now? (alien.top)

submitted 2 years ago by Hyddro26@alien.top to c/localllama@poweruser.forum

38 comments fedilink hide all child comments

Looking for any model that can run with 20 GB VRAM. Thanks!

you are viewing a single comment's thread
view the rest of the comments

[–] drifter_VR@alien.top 1 points 2 years ago (6 children)

A 34B model is the best fit for a 24GB GPU right now. Good speed and huge context window.
nous-capybara-34b is a good start

[–] GoofAckYoorsElf@alien.top 1 points 2 years ago (1 children)

nous-capybara-34b

I haven't been able to use that with my 3090Ti yet. I tried TheBloke's GPTQ and GGUF (4bit) versions. The first runs into memory issues, the second, loaded with llama.cpp (which it seems to be configured on) loads, but is excruciatingly slow (like 0.07t/sec).

I must admit that I am a complete noob regarding all the different variants and model loaders.

[–] drifter_VR@alien.top 1 points 2 years ago

Koboldcpp is the easiest way.
Get nous-capybara-34b.Q4_K_M.gguf (it just fits into 24GB VRAM with 8K context).
Here are my Koboldcpp settings (not sure if they are optimal but they work)

https://preview.redd.it/dco0bokvic1c1.jpeg?width=540&format=pjpg&auto=webp&s=bf188ea61481a9464593db79d690b26eb7989883

load more comments (4 replies)