Serious-Commercial10

joined 1 year ago

If you want to run it reliably, it's best to clone the PR in the link and compile it yourself. Quantizing gguf yourself is actually quite fast

https://github.com/ggerganov/llama.cpp/pull/4070

I've never successfully run the AutoAWQ model on a 3090, and I won't be trying it again!

[โ€“] Serious-Commercial10@alien.top 1 points 1 year ago (1 children)

For most people, they only need a few languages, such as en cn jp. If there are multiple combination versions, I will use it to develop my own translation application