LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental) (github.com)

submitted 2 years ago by oobabooga4@alien.top to c/localllama@poweruser.forum

6 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] llama_in_sunglasses@alien.top 1 points 2 years ago (1 children)

With Llama-2-70b-chat-E8P-2Bit from their zoo, quip# seems fairly promising. I'd have to try l2-70b-chat in exl2 at 2.4 bpw to compare but this model does not really feel like a 2 bit model so far, I'm impressed.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?