For most people, they only need a few languages, such as en cn jp. If there are multiple combination versions, I will use it to develop my own translation application
Serious-Commercial10
joined 1 year ago
For most people, they only need a few languages, such as en cn jp. If there are multiple combination versions, I will use it to develop my own translation application
If you want to run it reliably, it's best to clone the PR in the link and compile it yourself. Quantizing gguf yourself is actually quite fast
https://github.com/ggerganov/llama.cpp/pull/4070
I've never successfully run the AutoAWQ model on a 3090, and I won't be trying it again!