Hi everyone, I'd like to share something that I've been working on for the past few days: https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0
This model is the result of interleaving layers from three different models: Euryale-1.3-L2-70B, Nous-Hermes-Llama2-70b, and SynthIA-70B-v1.5, resulting in a model that it larger than any of the three used for the merge. I have branches on the repo for exl2 quants at 3.0 and 4.85 bpw, which will allow the model to run in 48GB or 80GB of vram, respectively.
I love using LLMs for RPs and ERPs and so my goal was to create something similar to Goliath, which is honestly the best roleplay model I've ever used. I've done some initial testing with it and so far the results seem encouraging. I'd love to get some feedback on this from the community! Going forward, my plan is to do more experiments with merging models together, possibly even going even larger than 120b parameters to see where the gains stop.
Huh, interesting weave, it did feel like it made less spelling and simple errors when comparing it to goliath.
Once again Euryale's included. The lack of xwin makes it better imo, Xwin may be smart but it has repetition issues at long context, that's just my opinion.
I'd honestly scale it down, there's really no need to go 120b, from testing a while back ~90-100b frankenmerges have the same effect.
Goliath makes spelling errors?
I've only used a handful of mistral 7B's due to constraints but I've never seen it make any spelling errors.
Is that a side effect of merging?
I have noticed too, that Goliath makes spelling errors somewhat frequently, more often than other models.
It doesn't seem to affect the "smarts" part as much though. It otherwise still makes high quality text.