Tight_Range_5690

joined 1 year ago

Clearing up confusion: GPT 3.5-Turbo may not be 20b after all in c/localllama@poweruser.forum

[–] Tight_Range_5690@alien.top 1 points 1 year ago

looking at huggingface models, a raw 20b is ~42gb, not a lot of space to fit big model quants. Q4KM of 70b llama fits in that (q2 is 30gb). and the smallest falcon 180b quantization is 74gb

that would make more sense while still being really impressive. not sure if someone wants to math it out, but what's the biggest B model that would fit in that on the lowest quants (q2-q3)?

disclaimer: bees are not everything, maybe they have great dataset/money/lies

permalink
fedilink
source
context