LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Brand New Mistral 16k Context Size Models got released last night from NurtureAI! (alien.top)

submitted 2 years ago by perlthoughts@alien.top to c/localllama@poweruser.forum

12 comments fedilink hide all child comments

In no particular order! Don't forget to use each of their specific prompts for the best generations!

AWQ, and GGUF also available.

https://huggingface.co/NurtureAI/zephyr-7b-beta-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-1-16k
https://huggingface.co/NurtureAI/SynthIA-7B-v2.0-16k

Have fun LocalLLaMA fam <3 ! Let us know what you find! <3

you are viewing a single comment's thread
view the rest of the comments

[–] permalip@alien.top 0 points 2 years ago (2 children)

I’m not sure who told who that Mistral models are only 8k or 4k. The sliding window is not the context size, it is the embedding positions that is the context size which is 32k.

[–] TeamPupNSudz@alien.top 1 points 2 years ago (1 children)

I’m not sure who told who that Mistral models are only 8k

The official Mistral product information.

Our very first foundational model: 7B parameters, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 8k context length. link

Does Mistral themselves actually mention 32k anywhere?

[–] permalip@alien.top 1 points 2 years ago

It has 32k, they mention it in their config "max_position_embeddings": 32768. This is the sequence length.

https://preview.redd.it/5r2c9592vr0c1.png?width=256&format=png&auto=webp&s=be88f25168e3cec16cbe7f9aad15f678edf97e99

[–] mcmoose1900@alien.top 0 points 2 years ago (1 children)

But "true" 16K-32K models like MistralLite seem to perform much better at long context than the default Mistral config.

[–] permalip@alien.top 1 points 2 years ago

There is nothing "true" context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.

https://preview.redd.it/rqe1hwc1vr0c1.png?width=256&format=png&auto=webp&s=79f14a98c097d2e8fb5718ffa4d524353b059a10