LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

ExLlamaV2: The Fastest Library to Run LLMs (towardsdatascience.com)

submitted 2 years ago by alchemist1e9@alien.top to c/localllama@poweruser.forum

22 comments fedilink hide all child comments

Is this accurate?

you are viewing a single comment's thread
view the rest of the comments

[–] llama_in_sunglasses@alien.top 1 points 2 years ago

I've tested pretty much all of the available quantization methods and I prefer exllamav2 for everything I run on GPU, it's fast and gives high quality results. If anyone wants to experiment with some different calibration parquets, I've taken a portion of the PIPPA data and converted it into various prompt formats, along with a portion of the synthia instruction/response pairs that I've also converted into different prompt formats. I've only tested them on OpenHermes, but they did make coherent models that all produce different generation output from the same prompt.

https://desync.xyz/calsets.html