My office PC had a RTX2060 12G and it runs 13b models at 4bit no problem.
That's pretty much it's limit though. 13b 4bit + 4096 context would max out the vram, but it is stable.
Community to discuss about Llama, the family of large language models created by Meta AI.
My office PC had a RTX2060 12G and it runs 13b models at 4bit no problem.
That's pretty much it's limit though. 13b 4bit + 4096 context would max out the vram, but it is stable.
thanks