LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Quantizing 70b models to 4-bit, how much does performance degrade? (alien.top)

submitted 2 years ago by ae_dataviz@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

The title, pretty much.

I'm wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

you are viewing a single comment's thread
view the rest of the comments

[–] yeawhatever@alien.top 1 points 2 years ago (2 children)

about 44 GB

[–] harrro@alien.top 1 points 2 years ago (1 children)

Using Q3, you can fit it in 36GB (I have a weird combo of RTX 3060 with 12GB and P40 with 24GB and I can run a 70B at 3bit fully on GPU).

[–] Dry-Vermicelli-682@alien.top 1 points 2 years ago (1 children)

So you have 2 GPUs on single m/b.. and the llama.cpp thing knows to use both? Does this work with AMD GPUs too?

[–] harrro@alien.top 1 points 2 years ago

Yes llama.cpp will automatically split the model to work across GPUs. You can also specify how much of the full model should be on each GPU.

Not sure on AMD support but for nvidia it's pretty easy to do.

[–] Dry-Vermicelli-682@alien.top 1 points 2 years ago (1 children)

44GB of GPU VRAM? WTH GPU has 44GB other than stupid expensive ones? Are average folks running $25K GPUS at home? Or those running these like working for company's with lots of money and building small GPU servers to run these?

[–] MiniEval_@alien.top 1 points 2 years ago (1 children)

Dual 3090/4090s. Still pricey as hell, but not out of reach for some folks.

[–] Dry-Vermicelli-682@alien.top 1 points 2 years ago (1 children)

So anyone wanting to play around with this at home, has to expect to drop about 4K or so for GPUs and a setup?

[–] drifter_VR@alien.top 1 points 2 years ago

I can get 2 3090 for 1200€ here on the second-hand market