this post was submitted on 20 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I want to run a 70B LLM locally with more than 1 T/s. I have a 3090 with 24GB VRAM and 64GB RAM on the system.

What I managed so far:

  • Found instructions to make 70B run on VRAM only with a 2.5B that run fast but the perplexity was unbearable. LLM was barely coherent.
  • I randomly made somehow 70B run with a variation of RAM/VRAM offloading but it run with 0.1 T/S

I saw people claiming reasonable T/s speeds. Sine I am a newbie, I barely can speak the domain language, and most instructions I found assume implicit knowledge I don't have*.

I need explicit instructions on what 70B model to download exactly, which Model loader to use and how to set parameters that are salient in the context.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] BlueMetaMind@alien.top 1 points 11 months ago (1 children)

Thank you. What does " at 5_K_M" mean ?
Can I use the text web UI with Llama.cpp as model loader or is this too much overhead for ?

[โ€“] mrjackspade@alien.top 1 points 11 months ago

I actually don't know how much overhead that's going to be. I'd start by just kicking it off on the command line first as a proof of concept, its super easy,

5_K_M is just the quantization I use. There's almost no loss of perplexity with 5_K_M, but its also larger than 4 which is what most people use.

Name Quant method Bits Size Max RAM required Use case
goat-70b-storytelling.Q2_K.gguf Q2_K 2 29.28 GB 31.78 GB smallest, significant quality loss - not recommended for most purposes
goat-70b-storytelling.Q3_K_S.gguf Q3_K_S 3 29.92 GB 32.42 GB very small, high quality loss
goat-70b-storytelling.Q3_K_M.gguf Q3_K_M 3 33.19 GB 35.69 GB very small, high quality loss
goat-70b-storytelling.Q3_K_L.gguf Q3_K_L 3 36.15 GB 38.65 GB small, substantial quality loss
goat-70b-storytelling.Q4_0.gguf Q4_0 4 38.87 GB 41.37 GB legacy; small, very high quality loss - prefer using Q3_K_M
goat-70b-storytelling.Q4_K_S.gguf Q4_K_S 4 39.07 GB 41.57 GB small, greater quality loss
goat-70b-storytelling.Q4_K_M.gguf Q4_K_M 4 41.42 GB 43.92 GB medium, balanced quality - recommended
goat-70b-storytelling.Q5_0.gguf Q5_0 5 47.46 GB 49.96 GB legacy; medium, balanced quality - prefer using Q4_K_M
goat-70b-storytelling.Q5_K_S.gguf Q5_K_S 5 47.46 GB 49.96 GB large, low quality loss - recommended
goat-70b-storytelling.Q5_K_M.gguf Q5_K_M 5 48.75 GB 51.25 GB large, very low quality loss - recommended
goat-70b-storytelling.Q6_K.gguf Q6_K 6 56.59 GB 59.09 GB very large, extremely low quality loss
goat-70b-storytelling.Q8_0.gguf Q8_0 8 73.29 GB 75.79 GB very large, extremely low quality loss - not recommended