candre23

joined 1 year ago
[–] candre23@alien.top 1 points 1 year ago

GGUF I get like tops 4-5 t/s.

You're doing something very wrong. I get better speeds than that on P40s with low context. Are you not using cublas?

[–] candre23@alien.top 1 points 1 year ago

The best noob-accessible explanation of LLMs I've found so far: https://blog.rfox.eu/en/Programming/How_to_run_your_own_LLM_GPT.html

The most entertaining (IMHO) explanation, which is (at best) 60% accurate: https://www.reddit.com/r/LocalLLaMA/comments/12ld62s/the_state_of_llm_ais_as_explained_by_somebody_who/

[–] candre23@alien.top 1 points 1 year ago

The 3090 will outperform the 4060 several times over. It's not even a competition - it's a slaughter.

As soon as you have to offload even a single layer to system memory (regardless of the speed), you cut your performance by an order of magnitude. I don't care if you have screaming fast DDR5 in 8 channels and a pair of the beefiest xeons money can buy, your performance will fall off a cliff the minute you start offloading. If a 3090 is within your budget, that is the unambiguous answer.

view more: β€Ή prev next β€Ί