this post was submitted on 15 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
But "true" 16K-32K models like MistralLite seem to perform much better at long context than the default Mistral config.
There is nothing "true" context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.
https://preview.redd.it/rqe1hwc1vr0c1.png?width=256&format=png&auto=webp&s=79f14a98c097d2e8fb5718ffa4d524353b059a10