this post was submitted on 01 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

https://huggingface.co/TheBloke/MistralLite-7B-GGUF

This is supposed to be a 32k context finetune of mistral. I’ve tried the recommended Q5 version in both GPT4all and LMStudio, and it works for normal short prompts but hangs and produces no output when I crank up the context length to 8k+ for data cleaning. I tried it cpu only (machine has 32GB of RAM so should be plenty) and hybrid with the same bad outcomes. Curious if there’s some undocumented ROPE settings that need to he adjusted.

Anyone get this to work with long prompts? Otherwise, what do y’all recommend for 32k+ context with good performance on data augmentation/cleaning, with <20B params for speed?

top 2 comments
sorted by: hot top controversial new old
[–] Chromix_@alien.top 1 points 10 months ago

You wrote that it works for short prompts. Did you also try slightly longer prompts (up to 4k tokens)? This doesn't hit the sliding window yet, but still resulted in not much useful output for me and some others.

[–] Ok_Neck_@alien.top 1 points 9 months ago

You can try our hosted version, and see if you get better results out of it.