this post was submitted on 22 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?

top 1 comments
sorted by: hot top controversial new old
[–] Maykey@alien.top 1 points 11 months ago

Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don't know if llama.cpp has it).