this post was submitted on 27 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

The title, pretty much.

I'm wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

you are viewing a single comment's thread
view the rest of the comments
[–] AnOnlineHandle@alien.top 1 points 11 months ago (10 children)

What sort of vram is needed to run a 4bit 70B model?

[–] yeawhatever@alien.top 1 points 11 months ago (7 children)
[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (3 children)

44GB of GPU VRAM? WTH GPU has 44GB other than stupid expensive ones? Are average folks running $25K GPUS at home? Or those running these like working for company's with lots of money and building small GPU servers to run these?

[–] MiniEval_@alien.top 1 points 11 months ago (1 children)

Dual 3090/4090s. Still pricey as hell, but not out of reach for some folks.

[–] Dry-Vermicelli-682@alien.top 1 points 11 months ago (1 children)

So anyone wanting to play around with this at home, has to expect to drop about 4K or so for GPUs and a setup?

[–] drifter_VR@alien.top 1 points 11 months ago

I can get 2 3090 for 1200€ here on the second-hand market

load more comments (1 replies)
load more comments (4 replies)
load more comments (6 replies)