LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Is there a way to prevent coherency degradation when using high levels of RoPE scaling? (alien.top)

submitted 2 years ago by tenmileswide@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

For the sake of argument, let's say VRAM is no object.

If I set alpha_value to around 2.5 to 3 when loading a normally 4k base context model, I can get up to about 10k context before things start noticeably falling apart. Extending context higher than this, even if I increase alpha_value higher to go along with it, the model gets progressively less coherent.

I've found that I can attenuate this a little bit by messing around with different alpha values at different context loads, but it never really gets usable. It gets closer to where it needs to be, but still nothing I'd actually want to run.

Is this just the nature of the beast when it comes to extending context?

you are viewing a single comment's thread
view the rest of the comments

[–] mrjackspade@alien.top 1 points 2 years ago (1 children)

Switch to using YARN is the best I'm aware of at the moment.

YARN is basically dynamic alpha scaling with extra steps, functions better without fine tuning, and also benefits from fine tuning.

https://private-user-images.githubusercontent.com/567732/276779985-6b37697c-896e-4199-a541-a489b6fad213.png

[–] SomeOddCodeGuy@alien.top 1 points 2 years ago

I've seen a couple of YARN models, but I honestly have no idea how to use them lol. That and the mistral models; they always want to load up at 32k tokens, but then coherency of the model just dies after 5k. I can't find really clear instructions on what's expected to get maximum context value from either, so I tend to just ignore using either at high context.