TuuNo_

joined 1 year ago
[–] TuuNo_@alien.top 1 points 11 months ago (1 children)

Well, I have never used Linux before since the main purpose of my pc is gaming. But I heard running LLMs on Linux is overall faster.

[–] TuuNo_@alien.top 1 points 11 months ago (5 children)

I would suggest you to use Koboldcpp and run GGUF. A 70B Q5 model, with around 40 layers loaded into GPU, should have more than 1t/s. At least for me, I got 1.5t/s with 4090 and 64GB ram using Q5_K_M.

[–] TuuNo_@alien.top 1 points 11 months ago