this post was submitted on 29 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Switch to using YARN is the best I'm aware of at the moment.
YARN is basically dynamic alpha scaling with extra steps, functions better without fine tuning, and also benefits from fine tuning.
https://private-user-images.githubusercontent.com/567732/276779985-6b37697c-896e-4199-a541-a489b6fad213.png
I've seen a couple of YARN models, but I honestly have no idea how to use them lol. That and the mistral models; they always want to load up at 32k tokens, but then coherency of the model just dies after 5k. I can't find really clear instructions on what's expected to get maximum context value from either, so I tend to just ignore using either at high context.