overview for candre23

🐺🐦‍⬛ LLM Format Comparison/Benchmark: 70B GGUF vs. EXL2 (and AWQ) in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 1 year ago

GGUF I get like tops 4-5 t/s.

You're doing something very wrong. I get better speeds than that on P40s with low context. Are you not using cublas?

Nouse-Capybara-34B 200K in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 1 year ago

🗺️ Well maintained guide to current state of AI and LLMs, for beginners/ non-tech professionals? in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 1 year ago

The best noob-accessible explanation of LLMs I've found so far: https://blog.rfox.eu/en/Programming/How_to_run_your_own_LLM_GPT.html

The most entertaining (IMHO) explanation, which is (at best) 60% accurate: https://www.reddit.com/r/LocalLLaMA/comments/12ld62s/the_state_of_llm_ais_as_explained_by_somebody_who/

Comparing 4060 Ti 16GB + DDR5 6000 vs 3090 24GB: looking for 34B model benchmarks in c/localllama@poweruser.forum

[–] candre23@alien.top 1 points 1 year ago

The 3090 will outperform the 4060 several times over. It's not even a competition - it's a slaughter.

As soon as you have to offload even a single layer to system memory (regardless of the speed), you cut your performance by an order of magnitude. I don't care if you have screaming fast DDR5 in 8 channels and a pair of the beefiest xeons money can buy, your performance will fall off a cliff the minute you start offloading. If a 3090 is within your budget, that is the unambiguous answer.