AffectionateCan2342

joined 10 months ago

Try a few different prompts and let us know what worked for you. For shorter translations, however, it should definitely be sufficient if you keep the system prompt and include the instruction for translating in the user prompt.

You could at least justify that the scientific basis for merging is given by the published papers on this topic area. Here are a few examples: https://arxiv.org/abs/2306.01708 https://arxiv.org/abs/2203.05482 https://arxiv.org/abs/2204.03044

Nevertheless, it must be admitted that some merges that should achieve good results on paper only produce gibberish in practice or vice versa. So you probably need a bit of luck ;-)

For the German-speaking world, however, I can definitely say that we are not primarily interested in getting better numbers, but in making the English-language models accessible to the German language, at least to some extent, without completely eliminating their cleverness. So the more intelligent the original English model is before it is fine-tuned with German data, the less stupid the model will be in German, and that is our goal as long as there are no German pretrained models.

Yes, we hope so too ;-) At least our first tests in real-world operation have shown quite good results. However, it should be noted that even if the benchmark results sound very promising, it is still a 7b model that has been pre-trained in English.

Although the model can respond very well in German thanks to our fine-tuning with German data, there can still be slight grammatical errors here and there, especially if the parameters for the inference were set too high. This is currently difficult to avoid, especially when it comes to smaller models. But we are already working on a solution.

There is always a fine line between: Keep the intelligence of the original English-language model and teach the model just enough so that it can "speak" German well.

 

​

SauerkrautLM-7b-Hero

πŸŽ‰ Exciting news in the world of AI language models! Introducing SauerkrautLM-7b-HerO, a groundbreaking German language model that's set to redefine bilingual language processing.

Find all the details on Huggingface: https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO

Developed by merging Teknium's OpenHermes-2.5-Mistral-7B and Open-Orca's Mistral-7B-OpenOrca, this model isn't just any ordinary merged language model. It's been uniquely fine-tuned using the Sauerkraut dataset, a rich and varied source of German language data.

What makes SauerkrautLM-7b-HerO stand out? Here's the scoop:

  • Optimal Balance: By integrating extensive German data with essential international sources, we've created a model that excels in understanding the nuances of the German language without compromising its global capabilities.
  • Innovative Technology: Utilizing the gradient SLERP method from MergeKit, we've seamlessly fused two of the most advanced 7B models based on the Mistral framework. This blend brings together the best features of both models, creating an unmatched synergy.
  • Cultural and Linguistic Mastery: The incorporation of the German Sauerkraut dataset, a unique mix of augmented and translated data, empowers the model to master the intricacies of the German language. This was achieved without the usual loss of core competencies that often comes with fine-tuning non-German models in German.
  • Bilingual Proficiency: Our approach ensures that SauerkrautLM-7b-HerO not only retains its original strengths but also gains a profound understanding of German. This sets a new benchmark in bilingual language model proficiency.

This isn't just a step forward in language modeling; it's a leap into a future where AI understands and communicates in German as naturally as it does in English without the need of resource extensive German Foundation Models.

πŸ” What are your thoughts on this new development? Let's discuss in the comments!

A brief review of relevant benchmarks performed with the new SauerkrautLM-7b-HerO model (more benchmarks on huggingface):

​

MT-Bench German

​

MT-Bench English

[–] AffectionateCan2342@alien.top 1 points 10 months ago (1 children)

Hey, David from SauerkrautLM here :)

first of all thank you soo much for your great work u/WolframRavenwolf !!

This is quite interesting and we already recognized your test for 7/13b models! Maybe I try to explain the results of SauerkrautLM in your great benchmark:

I tested all the English language models for a long time and they all had extreme problems displaying or reproducing German correctly. Often it was just articles that were set incorrectly and then also incorrect grammatical cases and bad sentence structures that simply reflected very poor German. It was also a great challenge to have the models answer exclusively in German. We had to specify at several points in the system prompt and user prompt that the model should only respond in German and even that never worked reliably.

We chose MT-Bench as the evaluation reference. In particular, we repeatedly noticed that the majority of the English base models answered our German MT-Bench questions almost entirely in English, or switched from German to English in the middle of a sentence. So our aim with SauerkrautLM was in particular to improve the quality of the answers in German in terms of grammar and spelling compared to English models. To achieve this, we naturally had to make some compromises.

In our many training trials before we were able to publish SauerkrautLM, we of course tried out a lot. As u/WolframRavenwolf has already suggested, we have of course also carried out training with a multilingual dataset. However, this led to a decrease in performance in both English and German. We also tried to train different ratios of German and English datasets and here too we have to say that the model decrease performance significantly in both English and German. However, our first tests with only German training data showed that we were able to achieve a significant improvement in the German MT-Bench.

This naturally means that the model's skills in English have decreased. But our priority was to improve the model's German language skills through fine-tuning and we achieved this. But here we also come to an important point: We did not train a German foundation model here, but rather fine-tuned a foundation model that had been trained almost exclusively in English. In my opinion, it will be (almost) impossible to fine-tune an English foundation model in German and then achieve the same results as an English foundation model that has been fine-tuned with English data.

And here, too, I would like to be more specific about the training data we used: u/WolframRavenwolf made the suggestion that we should simply translate the strong English datasets into German and then train them. Believe me, we tested for a long time until we had a fairly strong dataset that we could then use to train our models. And as described in the Huggingface Modelcard, we used a mixture of translated and augmented data.

Why didn't we just use translated data? There are simply too many cases in which the translation of English sentences into German does not work correctly. Similarly, gpt, for example, is not always able to deliver grammatically correct translations. We have already tested quite a few things with purely translated data and this simply leads to too many errors in the German version of the model. So it simply made sense to augment certain datasets that were quite complex in English in order to retain the meaning of the data but to ensure more correct German.

So you can be sure that we already use very strong English data sets in German form, but we also had to augment some of them in order to make fewer errors in the German language.

Also, the reference to your benchmark that the questions were in German but the character cards were in English doesn't sound to me at first like the German language models are extremely favoured here, but of course I can't assess the ratio of English to German data in the test. In my opinion, it was not so much the German language that was tested here, but rather the reasoning abilities of the models. I would be curious to see a test where generated answers in German are tested for the language models. It should be obvious that the SauerkrautLM models are better at formulating the German language and pay more attention to sentence structure and the like than English models.

​

To summarise again:

  1. I have tested many English models and was extremely disappointed with the German output of the models.

  2. in order to improve the German language of models, in my opinion almost exclusively German data must be fine-tuned.

  3. English foundation models that are fine-tuned in German can never reach the capabilities of English fine-tuned models or German foundation models (that are fine-tuned).

  4. Training with German data sets of course leads to a certain decrease in performance in categories that were trained in English. (You can actually see this clearly in the MT-Bench values achieved by the German mt-Bench and the English MT Bench - reached scores in German mt-bench always about 1.0 less than in englisch mt-bench)

  5. From our experience, the best German dataset resulted from the merge of translated and augmented data (to ensure existent data quality of English datasets and also reach strong German language results)

Now the answer has become quite long :D but I hope I was able to provide a little more clarity about the results (from our perspective) and our approach.

We are already testing local llms with unreal 5 for educational purposes (digital twin), combining it with RAG and faster whisper. Still in testing phase but seems really promising: https://vm.tiktok.com/ZGe1fstF9/ (starting at 0:40)