TuuNo_

joined 1 year ago

How to run 70B on 24GB VRAM ? in c/localllama@poweruser.forum

[–] TuuNo_@alien.top 1 points 11 months ago (1 children)

Well, I have never used Linux before since the main purpose of my pc is gaming. But I heard running LLMs on Linux is overall faster.

permalink
fedilink
source
context

How to run 70B on 24GB VRAM ? in c/localllama@poweruser.forum

[–] TuuNo_@alien.top 1 points 11 months ago (5 children)

I would suggest you to use Koboldcpp and run GGUF. A 70B Q5 model, with around 40 layers loaded into GPU, should have more than 1t/s. At least for me, I got 1.5t/s with 4090 and 64GB ram using Q5_K_M.

permalink
fedilink
source

What is considered the best uncensored LLM right now? in c/localllama@poweruser.forum

[–] TuuNo_@alien.top 1 points 11 months ago

https://github.com/ggerganov/llama.cpp/pull/1684 Higher parameter should be always better

permalink
fedilink
source
context