LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Quantizing 70b models to 4-bit, how much does performance degrade? (alien.top)

submitted 2 years ago by ae_dataviz@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

The title, pretty much.

I'm wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

you are viewing a single comment's thread
view the rest of the comments

[–] Dry-Vermicelli-682@alien.top 1 points 2 years ago (1 children)

44GB of GPU VRAM? WTH GPU has 44GB other than stupid expensive ones? Are average folks running $25K GPUS at home? Or those running these like working for company's with lots of money and building small GPU servers to run these?

[–] MiniEval_@alien.top 1 points 2 years ago (1 children)

Dual 3090/4090s. Still pricey as hell, but not out of reach for some folks.

[–] Dry-Vermicelli-682@alien.top 1 points 2 years ago (1 children)

So anyone wanting to play around with this at home, has to expect to drop about 4K or so for GPUs and a setup?

[–] drifter_VR@alien.top 1 points 2 years ago

I can get 2 3090 for 1200€ here on the second-hand market