this post was submitted on 30 Oct 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Hi, I have searched for a long time on this subreddit, in Ooba's documentation, Mistral's documentation and everything, but I just can't find what I am looking for.

I see everyone claiming Mistral can handle up to 32k context size, however while it technically won't refuse to generate anything above like 8k, the output is just not good. I have it loaded in Oobabooga's text-generation-webui and am using the API through SillyTavern. I loaded the normal Mistral 7B just to check, but with my current 12k story, all it can generate is gibberish if I give it the full context. However, I also checked using other fine-tunes of Mistral.

What am I doing wrong? I am using the GPTQ version on my RX 7900 XTX. Is it just advertising that it won't crash until 32k or something, or am I doing something wrong for not getting coherent output above 8k? I did mess with the alpha values, and while doing so does eliminate the gibberish, I do get the idea that the quality does suffer somehow.

you are viewing a single comment's thread
view the rest of the comments
[–] 4onen@alien.top 1 points 10 months ago
  • I did my best to explain Sliding Window Attention briefly there, so do let me know where my explanation is deficient.
  • No, you cannot set the window size and no, it's not in Oobabooga/text-generation-webui. It's trained in.
  • Well, good luck. AMD doesn't even support their own cards properly for AI (RoCm support skipped my last card's generation and the generation before it was only ever in beta support) which is why I finally gave up and switched to team green last year.