this post was submitted on 30 Oct 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Hi, I have searched for a long time on this subreddit, in Ooba's documentation, Mistral's documentation and everything, but I just can't find what I am looking for.

I see everyone claiming Mistral can handle up to 32k context size, however while it technically won't refuse to generate anything above like 8k, the output is just not good. I have it loaded in Oobabooga's text-generation-webui and am using the API through SillyTavern. I loaded the normal Mistral 7B just to check, but with my current 12k story, all it can generate is gibberish if I give it the full context. However, I also checked using other fine-tunes of Mistral.

What am I doing wrong? I am using the GPTQ version on my RX 7900 XTX. Is it just advertising that it won't crash until 32k or something, or am I doing something wrong for not getting coherent output above 8k? I did mess with the alpha values, and while doing so does eliminate the gibberish, I do get the idea that the quality does suffer somehow.

you are viewing a single comment's thread
view the rest of the comments
[–] mll59@alien.top 1 points 10 months ago

Thanks for your reaction. In this case I think it's not a bug in llama.cpp but in the parameters of the Mistral models. The original Mistral models have been trained on 8K context size, see Product | Mistral AI | Open source models .

But when I load a Mistral model, or a finetune of a Mistral model, koboldcpp always reports a trained context size of 32768, like this:

llm_load_print_meta: n_ctx_train = 32768

So llama.cpp (or koboldcpp) just assume that up to 32768 context size, no NTK scaling is needed and they leave the rope freq base at 10000, which I think is correct. I don't know why the model has this n_ctx_train parameter at 32768 instead of 8192, maybe a mistake?