LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Quantizing 70b models to 4-bit, how much does performance degrade? (alien.top)

submitted 2 years ago by ae_dataviz@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

The title, pretty much.

I'm wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

you are viewing a single comment's thread
view the rest of the comments

[–] daHaus@alien.top 1 points 2 years ago

This seems like something that would be difficult to predict considering how fundamental what your changing is. The method you use to quantize it and how refined it is also matters a great deal.

permalink
fedilink
source