this post was submitted on 15 Nov 2023

1 points (100.0% liked)

LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Brand New Mistral 16k Context Size Models got released last night from NurtureAI! (alien.top)

submitted 2 years ago by perlthoughts@alien.top to c/localllama@poweruser.forum

12 comments fedilink hide all child comments

In no particular order! Don't forget to use each of their specific prompts for the best generations!

AWQ, and GGUF also available.

https://huggingface.co/NurtureAI/zephyr-7b-beta-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-1-16k
https://huggingface.co/NurtureAI/SynthIA-7B-v2.0-16k

Have fun LocalLLaMA fam <3 ! Let us know what you find! <3

top 12 comments

sorted by: hot top controversial new old

[–] perlthoughts@alien.top 1 points 2 years ago

I also released chupacabra 7b awq version to get extra crispy.

[–] mll59@alien.top 1 points 2 years ago

First, thank you for sharing. However, I was a bit puzzled by these finetunes since many finetunes based on Mistral can simply support longer context out of the box by using NTK scaling, see here. Alas, I couldn't find any information about what NurtureAI did to extend the context in their model cards.

I've tested the NurtureAI synthia-7b-v2-16k-q8_0.gguf, using koboldcpp v1.49 using the native rope configuration of the model (which has a rope base freq of 1000000), in an existing conversation of 14971 tokens, asking it to generate a standup comedy about the preceding conversation and it produced incoherent babbling. Using the original model synthia-7b-v2.0.Q8_0.gguf (which has a rope base freq of 10000) with --ropeconfig 1.0 45000 gives me a coherent standup comedy that makes sense.

How well this NTK scaling on Mistral-based finetunes works depends on the finetune, for some it works better than for others. For example, when I ask the original zephyr-7b-beta.Q8_0.gguf finetune, in an existing conversation of 25872 tokens, to produce a rhyming poem about the preceding conversation, the resulting poem actually mostly rhymes. Other original finetunes, like synthia-7b-v2.0.Q8_0.gguf, seem still coherent at this context size but are not able to produce rhyming poems anymore.

Anyway, based on my experiments, these extended context models by NurtureAI do not work for me and just using NTK scaling on original Mistral-based finetunes does.

[–] permalip@alien.top 0 points 2 years ago (2 children)

I’m not sure who told who that Mistral models are only 8k or 4k. The sliding window is not the context size, it is the embedding positions that is the context size which is 32k.

[–] TeamPupNSudz@alien.top 1 points 2 years ago (1 children)

I’m not sure who told who that Mistral models are only 8k

The official Mistral product information.

Our very first foundational model: 7B parameters, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 8k context length. link

Does Mistral themselves actually mention 32k anywhere?

[–] permalip@alien.top 1 points 2 years ago

It has 32k, they mention it in their config "max_position_embeddings": 32768. This is the sequence length.

https://preview.redd.it/5r2c9592vr0c1.png?width=256&format=png&auto=webp&s=be88f25168e3cec16cbe7f9aad15f678edf97e99

[–] mcmoose1900@alien.top 0 points 2 years ago (1 children)

But "true" 16K-32K models like MistralLite seem to perform much better at long context than the default Mistral config.

[–] permalip@alien.top 1 points 2 years ago

There is nothing "true" context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.

https://preview.redd.it/rqe1hwc1vr0c1.png?width=256&format=png&auto=webp&s=79f14a98c097d2e8fb5718ffa4d524353b059a10

[–] vasileer@alien.top 0 points 2 years ago (1 children)

is this a scam or what? none of the models above are from NurtureAI:

- zephyr-beta is trained by HuggingFace and is 32K by default

- neural-chat is from Intel

- synthia is from migtissera

Original links:

https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

https://huggingface.co/Intel/neural-chat-7b-v3-1

https://huggingface.co/migtissera/SynthIA-7B-v2.0

[–] MugosMM@alien.top 0 points 2 years ago (1 children)

NurtureAI extended the context size to 16k

[–] vasileer@alien.top 0 points 2 years ago (1 children)

the context was already 32K

https://preview.redd.it/5jl7c7a53i0c1.png?width=958&format=png&auto=webp&s=ae51ae2b52717bb5ab14bed76580e7e0a45075ed

[–] MINIMAN10001@alien.top 0 points 2 years ago (1 children)

So assuming this release does anything at all the only thing I can think of would be that instead of "hidden size" cause being 4k giving a 4k sliding window into 32k context it would be a hidden size of 16k giving a 16k window into the 32k context.

However that's just speculation on my part because... Otherwise the release means nothing... Which would be weird.

[–] Flag_Red@alien.top 1 points 2 years ago

That's not what hidden size does.