this post was submitted on 09 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I haven't used gptq in a while, but i can say that gguf has 8 bit quantization, which you can use with llamacpp. Furthermore, if you use the original huggingface models, the ones which you load using the transformers loader, you have options in there to load in either 8 or 4bit.
thanks!