LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

How much more stupid is the 120B goliath Q3_K_M than the larger options? (alien.top)

submitted 1 year ago by Secret_Joke_2262@alien.top to c/localllama@poweruser.forum

6 comments fedilink hide all child comments

I want to download the goliath model but I can only afford Q3_K_M. It is written that it has high quality losses. How much quality loss is there?

I heard that the larger the model, the less it suffers intellectually when it is optimized. I usually use 70B Q5_K_M. Can I expect that 120B Q3_K_M will be significantly better than 70B Q5_K_M so that the time spent on downloading will be worth it?

https://preview.redd.it/1dvpq4bq8c0c1.png?width=1148&format=png&auto=webp&s=79588237d01a66643cfdb12cc13b84866df4bf68

you are viewing a single comment's thread
view the rest of the comments

[–] SomeOddCodeGuy@alien.top 1 points 1 year ago (3 children)

I imagine it's pretty solid.

I've tested around the with q4_K_M and the q8 on my Mac Studio, and the q4 is pretty darn good. There's some difference in that the q4 does seem to get confused when I talk to it sometimes, whereas the q8 seems unshakeable in its quality, but honestly the q4 still feels better than almost any other model I've ever used.

[–] quaquaversal_@alien.top 1 points 1 year ago (1 children)

What's the tok/s for each of those models on that system?

Edit: also, if you don't mind my asking, how much context are you able to use before inference degrades?

[–] Murky-Ladder8684@alien.top 1 points 11 months ago

for comparison sake EXL2 4.85bpw version runs around 6-8 t/s on 4x3090s at 8k context it's the lower end.

load more comments (1 replies)