this post was submitted on 20 Nov 2023
1 points (100.0% liked)
LocalLLaMA
11 readers
4 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I haven't tried to run a model that big on CPU RAM only, but running a Q4_0 gguf of Causal 14B was already mind numbingly slow on my rig.
General rule of thumb, always utilize as much of your VRAM (GPU RAM) as possible since CPU RAM is exponentially slower. I'm guessing your connection timed out because it just took to long to load/run.
With a 4090, you can actually run lzlv 70B fully on your 24GB VRAM. Let's not let your amazing GPU go to waste! Try these steps and let me know if it works out for you: