LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Exllama v2 vs. llama.cpp (All layes offloaded to GPU) (alien.top)

submitted 11 months ago by WinterUsed1120@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?

top 1 comments

sorted by: hot top controversial new old

[–] Maykey@alien.top 1 points 11 months ago

Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don't know if llama.cpp has it).

permalink
fedilink
source