LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

I don't understand Mistral and context size, honestly. (alien.top)

submitted 2 years ago by anti-lucas-throwaway@alien.top to c/localllama@poweruser.forum

11 comments fedilink hide all child comments

Hi, I have searched for a long time on this subreddit, in Ooba's documentation, Mistral's documentation and everything, but I just can't find what I am looking for.

I see everyone claiming Mistral can handle up to 32k context size, however while it technically won't refuse to generate anything above like 8k, the output is just not good. I have it loaded in Oobabooga's text-generation-webui and am using the API through SillyTavern. I loaded the normal Mistral 7B just to check, but with my current 12k story, all it can generate is gibberish if I give it the full context. However, I also checked using other fine-tunes of Mistral.

What am I doing wrong? I am using the GPTQ version on my RX 7900 XTX. Is it just advertising that it won't crash until 32k or something, or am I doing something wrong for not getting coherent output above 8k? I did mess with the alpha values, and while doing so does eliminate the gibberish, I do get the idea that the quality does suffer somehow.

you are viewing a single comment's thread
view the rest of the comments

[–] anti-lucas-throwaway@alien.top 1 points 2 years ago

So I did some research and after I while in the rabbit hole I think that sliding window attention is not implemented in ExLlama (or v2) yet, and it is not in the AMD ROCm fork of Flash Attention yet either.

I think that means it's just unsupported right now. Very unfortunate, but I guess I'll have to wait. Waiting for support is the price I pay for saving 900 euros on a GPU by not buying a 4090, but a 7900 XTX. I'm fine with that.