LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

7B models keep repeating/glitching after certain number of tokens (alien.top)

submitted 2 years ago by GustavoToyota@alien.top to c/localllama@poweruser.forum

4 comments fedilink hide all child comments

I'm using ollama and I have a RTX 3060 TI. Using only 7B models.

I tested with Mistral 7B, Mistral-OpenOrca and Zephyr, they all had the same problem where they kept repeating or speaking randomly after some amount of chatting.

What could it be? Temperature? VRAM? ollama?

top 4 comments

sorted by: hot top controversial new old

[–] RayIsLazy@alien.top 1 points 2 years ago

I had this using other clients, try lm studio with a gguf and chatml, works well for me.

[–] ntn8888@alien.top 1 points 2 years ago

I've noticed this extensively when running locally on my 8gb rx580. And the issue is pretty bad.. I've run exactly the models you stated.

But when I run on (big) cloud GPU on vast.ai (eg on rtx 3090 or A6000) the problem vanishes..

vast.ai is pretty cheap ($10 deposit)you can experiment on there and see.

[–] LienniTa@alien.top 1 points 2 years ago

goliath 120b would fit in 64 ram, tho. It doesnt have repeating problem...

[–] LocoLanguageModel@alien.top 1 points 2 years ago

I just posted this somewhere else but it seems relevant, try KoboldCPP, it has this feature enabled by default:

Context Shifting is a better version of Smart Context that only works for GGUF models. This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing. So long as you use no memory/fixed memory and don't use world info, you should be able to avoid almost all reprocessing between consecutive generations even at max context. This does not consume any additional context space, making it superior to SmartContext. Context Shifting is enabled by default, and will override smartcontext if both are enabled. Your outputs may be different with shifting enabled, but both seem equally coherent. To disable Context Shifting, use the flag --noshift.