What are you using to run them?
In any case, larger context models require *a lot* more RAM/VRAM.
Community to discuss about Llama, the family of large language models created by Meta AI.
What are you using to run them?
In any case, larger context models require *a lot* more RAM/VRAM.
I'm using ooba, I haven't bothered much with KoboldCPP because I'm not really running GGUF
What kind of performance do you get on this rig with a 7B 8bit model like mistral?