this post was submitted on 18 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Looking for any model that can run with 20 GB VRAM. Thanks!

you are viewing a single comment's thread
view the rest of the comments
[–] drifter_VR@alien.top 1 points 10 months ago (6 children)

A 34B model is the best fit for a 24GB GPU right now. Good speed and huge context window.
nous-capybara-34b is a good start

[–] GoofAckYoorsElf@alien.top 1 points 10 months ago (1 children)

nous-capybara-34b

I haven't been able to use that with my 3090Ti yet. I tried TheBloke's GPTQ and GGUF (4bit) versions. The first runs into memory issues, the second, loaded with llama.cpp (which it seems to be configured on) loads, but is excruciatingly slow (like 0.07t/sec).

I must admit that I am a complete noob regarding all the different variants and model loaders.

[–] drifter_VR@alien.top 1 points 10 months ago

Koboldcpp is the easiest way.
Get nous-capybara-34b.Q4_K_M.gguf (it just fits into 24GB VRAM with 8K context).
Here are my Koboldcpp settings (not sure if they are optimal but they work)

https://preview.redd.it/dco0bokvic1c1.jpeg?width=540&format=pjpg&auto=webp&s=bf188ea61481a9464593db79d690b26eb7989883

load more comments (4 replies)