Try different version of the model.
What is the performance of a gguf q4-q6 on a single card?
Community to discuss about Llama, the family of large language models created by Meta AI.
Try different version of the model.
What is the performance of a gguf q4-q6 on a single card?
sry for late reply. i already test about that , it is better than codellama 13b model but ,, 30token/s ..