WinterUsed1120

joined 11 months ago

Exllama v2 vs. llama.cpp (All layes offloaded to GPU) (alien.top)

submitted 11 months ago by WinterUsed1120@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?