LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

communick@poweruser.forum

SLLLLLLOOOOOOOWWWWWWWW (alien.top)

submitted 9 months ago by cjhoneycomb@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

So. My rig (Ryzen 7 3700x, 64G Ram, RTX3070, Intel Arc 380) can run up to 70B parameter models... but they run at a snails pace. Furthermore, i don't honestly see that big of an improvement for regular chat task from a 70B parameter model vs a 13B parameter model. Don't get me wrong.. there is an improvement in adherence sometimes, it's just not a GIANT leap forward as i expected. Especially the 30B ish models. Basically no difference between 30B and 70B. I run everything at Q5.

Here is my question... Would running a 70b at Q2 be better than a 7B or 13B at Q5? Would speed improve?

Also, I notice that Mistral runs faster on my machine even at the same parameter counts than LLAMA models... anyone know why?

I know i could run all these test myself theoretically but there is just so much to test and so little time. I figured I'd ask around and see if someone else did it first.

you are viewing a single comment's thread
view the rest of the comments

[–] Brave-Decision-1944@alien.top 1 points 9 months ago

The Q does a lot, check that many models have recommendations which one is better. Low Q is faster but less accurate. But mind that best hits are those marked with K and S/M/L. Downloaded and tried every one of same model to compare, and I recommended you to do the same. And also to check out what that K, S,M,L exactly stand for.