this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

โ€‹

SauerkrautLM-7b-Hero

๐ŸŽ‰ Exciting news in the world of AI language models! Introducing SauerkrautLM-7b-HerO, a groundbreaking German language model that's set to redefine bilingual language processing.

Find all the details on Huggingface: https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO

Developed by merging Teknium's OpenHermes-2.5-Mistral-7B and Open-Orca's Mistral-7B-OpenOrca, this model isn't just any ordinary merged language model. It's been uniquely fine-tuned using the Sauerkraut dataset, a rich and varied source of German language data.

What makes SauerkrautLM-7b-HerO stand out? Here's the scoop:

  • Optimal Balance: By integrating extensive German data with essential international sources, we've created a model that excels in understanding the nuances of the German language without compromising its global capabilities.
  • Innovative Technology: Utilizing the gradient SLERP method from MergeKit, we've seamlessly fused two of the most advanced 7B models based on the Mistral framework. This blend brings together the best features of both models, creating an unmatched synergy.
  • Cultural and Linguistic Mastery: The incorporation of the German Sauerkraut dataset, a unique mix of augmented and translated data, empowers the model to master the intricacies of the German language. This was achieved without the usual loss of core competencies that often comes with fine-tuning non-German models in German.
  • Bilingual Proficiency: Our approach ensures that SauerkrautLM-7b-HerO not only retains its original strengths but also gains a profound understanding of German. This sets a new benchmark in bilingual language model proficiency.

This isn't just a step forward in language modeling; it's a leap into a future where AI understands and communicates in German as naturally as it does in English without the need of resource extensive German Foundation Models.

๐Ÿ” What are your thoughts on this new development? Let's discuss in the comments!

A brief review of relevant benchmarks performed with the new SauerkrautLM-7b-HerO model (more benchmarks on huggingface):

โ€‹

MT-Bench German

โ€‹

MT-Bench English

top 11 comments
sorted by: hot top controversial new old
[โ€“] yahma@alien.top 1 points 11 months ago (1 children)

Very exciting for multi-lingual models. I really hope this one performs as well as the benchmarks suggest.

[โ€“] AffectionateCan2342@alien.top 1 points 11 months ago

Yes, we hope so too ;-) At least our first tests in real-world operation have shown quite good results. However, it should be noted that even if the benchmark results sound very promising, it is still a 7b model that has been pre-trained in English.

Although the model can respond very well in German thanks to our fine-tuning with German data, there can still be slight grammatical errors here and there, especially if the parameters for the inference were set too high. This is currently difficult to avoid, especially when it comes to smaller models. But we are already working on a solution.

There is always a fine line between: Keep the intelligence of the original English-language model and teach the model just enough so that it can "speak" German well.

[โ€“] No-Link-2778@alien.top 1 points 11 months ago (1 children)

Do you think there is any scientific basis for the merge? This is medieval alchemy again. And I hope you can make some data public that you recognize as a native speaker, which would be good for public research, rather than merging without theoretical basis in order to improve "score performance".

[โ€“] AffectionateCan2342@alien.top 1 points 11 months ago

You could at least justify that the scientific basis for merging is given by the published papers on this topic area. Here are a few examples: https://arxiv.org/abs/2306.01708 https://arxiv.org/abs/2203.05482 https://arxiv.org/abs/2204.03044

Nevertheless, it must be admitted that some merges that should achieve good results on paper only produce gibberish in practice or vice versa. So you probably need a bit of luck ;-)

For the German-speaking world, however, I can definitely say that we are not primarily interested in getting better numbers, but in making the English-language models accessible to the German language, at least to some extent, without completely eliminating their cleverness. So the more intelligent the original English model is before it is fine-tuned with German data, the less stupid the model will be in German, and that is our goal as long as there are no German pretrained models.

[โ€“] EnnioEvo@alien.top 1 points 11 months ago

It would be awesome if you could release some info to reproduce it with other languages

[โ€“] yahma@alien.top 1 points 11 months ago (1 children)

Has anyone tested this yet? We have a use case for our European partners from German speaking countries. Would like to know what other people's experiences are.

[โ€“] Traditional-Plate642@alien.top 1 points 11 months ago (1 children)

I think everyone is waiting for TheBloke :D

[โ€“] Ion_GPT@alien.top 1 points 11 months ago

The quantization will greatly reduce multilingual capabilities

[โ€“] GlitteringCheetah707@alien.top 1 points 11 months ago

Hey Folks, we will reply to your comments in the next days. Sorry for being a little inactive. Sauerkraut Team has lots of stuff to do at the moment.

[โ€“] Ion_GPT@alien.top 1 points 11 months ago (1 children)

Do you have a prompt for translating?

[โ€“] AffectionateCan2342@alien.top 1 points 11 months ago

Try a few different prompts and let us know what worked for you. For shorter translations, however, it should definitely be sufficient if you keep the system prompt and include the instruction for translating in the user prompt.