this post was submitted on 04 Dec 2023
1 points (100.0% liked)

LocalLLaMA

11 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] llama_in_sunglasses@alien.top 1 points 2 years ago (1 children)

With Llama-2-70b-chat-E8P-2Bit from their zoo, quip# seems fairly promising. I'd have to try l2-70b-chat in exl2 at 2.4 bpw to compare but this model does not really feel like a 2 bit model so far, I'm impressed.

[โ€“] a_beautiful_rhind@alien.top 1 points 2 years ago

From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?