Does anyone have some hints how to use exllamav2 and extended context length by using GPTQ weights?
this post was submitted on 12 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
If you never set rope base (or alpha) higher then it will just have stock context.
I'm wondering too. Openhermes 2.5 works fine for me on Oobabooga but it just stops outputting any tokens once it reaches 4k context despite having everything set for 8k (I'm running GGUF offloaded to gpu).
For llama2 models set your alpha to 2.65 when loading them at 8k.
The general suggestion is “2.5” but if you plot the formula on a graph, 8192 context aligns with 2.642, so 2.65 is more accurate than 2.5