There is nothing "true" context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.
There is nothing "true" context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.
FYI, AWQ released 0.1.7 that fixes multi-GPU. Should alleviate OOM issues on multi-GPU, which became broken with newer versions of Huggingface libraries.
https://github.com/casper-hansen/AutoAWQ/releases/tag/v0.1.7
I’m not sure who told who that Mistral models are only 8k or 4k. The sliding window is not the context size, it is the embedding positions that is the context size which is 32k.
It has 32k, they mention it in their config "max_position_embeddings": 32768. This is the sequence length.
https://preview.redd.it/5r2c9592vr0c1.png?width=256&format=png&auto=webp&s=be88f25168e3cec16cbe7f9aad15f678edf97e99