LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Should I be able to run lzlv_Q4_K_M.gguf with 128gb cpu ram? It keeps erroring out. (alien.top)

submitted 2 years ago by Brad12d3@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

I'm still new to this and I thought that 128gb CPU ram would be enough to run a 70b model? I also have an RTX 4090. However, everytime I try to run lzlv_Q4_K_M.gguf in Text Generation UI, I get "connection errored out". Could there be a setting that I should tinker with?

you are viewing a single comment's thread
view the rest of the comments

[–] brobruh211@alien.top 1 points 2 years ago

I haven't tried to run a model that big on CPU RAM only, but running a Q4_0 gguf of Causal 14B was already mind numbingly slow on my rig.

General rule of thumb, always utilize as much of your VRAM (GPU RAM) as possible since CPU RAM is exponentially slower. I'm guessing your connection timed out because it just took to long to load/run.

With a 4090, you can actually run lzlv 70B fully on your 24GB VRAM. Let's not let your amazing GPU go to waste! Try these steps and let me know if it works out for you:

Paste this on the Download box of text-gen-ui: waldie/lzlv-limarpv3-l2-70b-2.4bpw-h6-exl2
Hit download. This should download an ExLlamav2 quant of lzlv that fits in your VRAM.
Select the model from the drop down and just hit Load using the default settings. (Optional) You can tick "Use 8-bit cache to save VRAM"
Enjoy! The perplexity of the file I suggested as high as lzlv_Q4_K_M, but at least you should be able to run it with no problems and get decent outputs as well.