this post was submitted on 10 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Random update on this, I did some more experimenting on the start of a story (with LimaRP and Petrol LoRAs), and the 4K model seems... fine? So does the 200K.
I don't how know to stretch out the base model. Their page claims it supports 32K, but it has a 4K context in the config and no RoPE scaling section. Just a high rope theta.
The one difference I did notice is that the 200K model really likes to summarize and reference previous parts of the story. Maybe it was trained on retrieval or summarization examples.