this post was submitted on 26 Nov 2023

1 points (100.0% liked)

LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Venus-120b: A merge of three different models in the style of Goliath-120b (alien.top)

submitted 2 years ago by nsfw_throwitaway69@alien.top to c/localllama@poweruser.forum

42 comments fedilink hide all child comments

Hi everyone, I'd like to share something that I've been working on for the past few days: https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0

This model is the result of interleaving layers from three different models: Euryale-1.3-L2-70B, Nous-Hermes-Llama2-70b, and SynthIA-70B-v1.5, resulting in a model that it larger than any of the three used for the merge. I have branches on the repo for exl2 quants at 3.0 and 4.85 bpw, which will allow the model to run in 48GB or 80GB of vram, respectively.

I love using LLMs for RPs and ERPs and so my goal was to create something similar to Goliath, which is honestly the best roleplay model I've ever used. I've done some initial testing with it and so far the results seem encouraging. I'd love to get some feedback on this from the community! Going forward, my plan is to do more experiments with merging models together, possibly even going even larger than 120b parameters to see where the gains stop.

top 42 comments

sorted by: hot top controversial new old

[–] Saofiqlord@alien.top 1 points 2 years ago (1 children)

Huh, interesting weave, it did feel like it made less spelling and simple errors when comparing it to goliath.

Once again Euryale's included. The lack of xwin makes it better imo, Xwin may be smart but it has repetition issues at long context, that's just my opinion.

I'd honestly scale it down, there's really no need to go 120b, from testing a while back ~90-100b frankenmerges have the same effect.

[–] CardAnarchist@alien.top 1 points 2 years ago (1 children)

Goliath makes spelling errors?

I've only used a handful of mistral 7B's due to constraints but I've never seen it make any spelling errors.

Is that a side effect of merging?

[–] noeda@alien.top 1 points 2 years ago

I have noticed too, that Goliath makes spelling errors somewhat frequently, more often than other models.

It doesn't seem to affect the "smarts" part as much though. It otherwise still makes high quality text.

[–] noeda@alien.top 1 points 2 years ago (1 children)

I will set this to run overnight on Hellaswag 0-shot like I did here on Goliath when it was new: https://old.reddit.com/r/LocalLLaMA/comments/17rsmox/goliath120b_quants_and_future_plans/k8mjanh/

Thanks for the model! I started investigating some approaches to combine models and see if it can be better than its individual parts. Just today I finished code to use a genetic algorithm to pick out parts and frankenstein 7B models together (trying to prove that there is merit to this approach using smalelr models...but we'll see).

I'll report back on the Hellaswag results on this model.

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago

Thanks! I'm eager to see the results :)

[–] xadiant@alien.top 1 points 2 years ago (3 children)

Any tips/attempts on frankensteining 2 yi-34b models together to make a ~51B model?

[–] llama_in_sunglasses@alien.top 1 points 2 years ago

Don't shuffle well, keep it in chunks.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

We need 2 or 3 yi stacked together and then face them off vs 70b.

[–] xadiant@alien.top 1 points 2 years ago

Exactly what I was thinking. I just fail miserably each time I merge the layers.

[–] waxbolt@alien.top 1 points 2 years ago

Almost happened (straight merge with 34b size). The result is good. https://huggingface.co/brucethemoose/Capybara-Tess-Yi-34B-200K-DARE-Ties

[–] xinranli@alien.top 1 points 2 years ago

Great work! Does anyone happen to have a guide, tutorial, or paper on how to combine or interleave models together? I would also love to try it out frankensteining models

[–] Ok_Library5522@alien.top 1 points 2 years ago (1 children)

Is this model better at writing stories? I want to compare it with goliath, which I use on my local computer. Goliath can write stories, but he definitely lacks originality and creativity

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago (1 children)

Hard to say. Try it out and let me know!

[–] tenmileswide@alien.top 1 points 2 years ago (1 children)

One thing's for sure: it handles RoPE scaling much better than Goliath. Goliath starts falling apart at about 10-12k context for me, but Venus didn't start doing so until like 30k.

[–] r4ouldukke@alien.top 1 points 2 years ago

What hardware are you guys even using to run something this big?

[–] trollsalot1234@alien.top 1 points 2 years ago (1 children)

I...also L ove oliath! I ...... i RALLY hope you're is better. A random hallucination walks up and punches trollsalot right in the face. WHY ARENT WE HAVING SEX YET! she screams

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago

Try it out and let me know! I included Nous-Hermes in the merge because I've found it to be one of the best roleplaying models that doesn't hallucinate too much. However, Nous-Hermes also tends to lack a bit in terms of the prose it writes, from my experience. I was hoping to get something that's coherent most of the time and creative.

[–] th3st0rmtr00p3r@alien.top 1 points 2 years ago (1 children)

I could not get any of the quants loaded, looks like the config is looking for XX of 25 safetensors

FileNotFoundError: No such file or directory: "models\Venus-120b-v1.0\model-00001-of-00025.safetensors"

with exl2-3.0bpw having only XX of 06 safetensors

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago (2 children)

🤔 How are you trying to load it? I tested both quants in text-generation-webui and they worked fine for me. I used exllama2_hf to load it

[–] th3st0rmtr00p3r@alien.top 1 points 2 years ago

Defaulted to transformers, loaded right away in ExLlamav2_HF, thank you I didn't know what I don't know.

[–] panchovix@alien.top 1 points 2 years ago

Models on ooba without "exl" on the folder name will redirect to transformers by default, so that may be the reason he got that by default.

[–] Aaaaaaaaaeeeee@alien.top 1 points 2 years ago

possibly even going even larger than 120b parameters

I didn't know that was possible, have people made a 1T model yet?

[–] CheatCodesOfLife@alien.top 1 points 2 years ago (2 children)

haha damn, I should have taken the NSFW warning seriously before clicking the huggingface link in front of people lol.

Is this model any good for SFW stuff?

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago

Yeah I wanted a picture to go with the model and that's what stable diffusion spat out :D

And I haven't tried it for SFW stuff but my guess is that it would work fine.

[–] uti24@alien.top 1 points 2 years ago (1 children)

Is this model any good for SFW stuff?

Every uncensored llm I tried worked fine with SFW stuff.

If you are talking about story telling they might be even better that SFW models. And I also never seen NSFW/uncensored models to write NSFW stuff unless explicitly asked to do so.

[–] CheatCodesOfLife@alien.top 1 points 2 years ago

Yeah okay, I'll give them a try again. I only ever tried one, and it was completely insane, always ended up with something sexual and after a while started randomly spamming words like 'sex toy'.

Looks like it was taken down / experimental: https://old.reddit.com/r/LocalLLaMA/comments/16qrdpa/plotbot_13b_finetuned_llama_2_model_for_writing/

[–] Distinct-Target7503@alien.top 1 points 2 years ago (1 children)

That's a great work!

Just a question... Have anyone tried to fine tune one of those "Frankenstein" models? Some time ago (when the first "Frankenstein" came out, it was a ~20B model) I read here on reddit that lots of users agreed that a fine tune on those merged models would have "better" results since it would help to "smooth" and adapt the merged layers. Probably I lack the technical knowledge needed to understand, so I'm asking...

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

Tess-XL-1.0.. so far I didn't like the results.

[–] Distinct-Target7503@alien.top 1 points 2 years ago

Is that a LORA or a full fine tune?

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

Hell yea! No Xwin. I hate that model. I'm down for the 3 bit. I didn't like tess-XL so far so hopefully you made a david here.

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago

I used this dataset for the quants: https://huggingface.co/datasets/jasonkstevens/pippa-llama2-chat/tree/refs%2Fconvert%2Fparquet/default/train

[–] ambient_temp_xeno@alien.top 1 points 2 years ago (1 children)

I still have this feeling in my gut that closedai have been doing this for a while. It seems like a free lunch.

[–] Charuru@alien.top 1 points 2 years ago

I don't think so, this is something you do when you're GPU poor, closedai would just not undertrain their models in the first place.

[–] Human-Most-6115@alien.top 1 points 2 years ago

It seems like my duel rtx 4090 setup just falls short of memory to load it up, where Goliath loads fine on the 3.0 bpw model.

[–] FireWoIf@alien.top 1 points 2 years ago

Looks promising! I’ll try loading it up on my 2x3090 setup on 3.0bpw

[–] uti24@alien.top 1 points 2 years ago

Oh we definably need GGUF variant of this model, I love Goliat-120B (I event think it might be better that Falcon-180B) and would love to run this model.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

Sadly doesn't work on 48gb like the other 120b. It can only fit sub 2048 context otherwise it goes OOM.

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago (1 children)

Crap, what's your setup? I tested it with a single 48GB card but if you're using 2x 24 then it might not work. I'll have to make a 2.8 bpw quant (or get someone else to do it) so that it'll work with card splitting.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)

I have 2x3090 for exl2. I have tess and goliath and both fit with ~3400 context so somehow your quant is slightly bigger.

[–] nsfw_throwitaway69@alien.top 1 points 2 years ago (1 children)

Venus-120b is actually a bit bigger than Goliath-120b. Venus has 140 layers and Goliath has 136 layers, so that would explain it.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Makes sense.. it's doing pretty well. Like the replies. Set the limit to 3400 in tabby, no oom yet but using 98%/98%. I assume this means I can bump up the other models past 3400 too if I'm using tabby and autosplit.

[–] CryptoSpecialAgent@alien.top 1 points 2 years ago

We need a benchmark specifically for nsfw content generation. Because I have a theory that I think I may try and prove: nsfw content, at least of a textual nature, can hold it's own against human authors even with a 7b model...

Rwkv for example. Its just a toy for most things. But give it a couple lines of erotica and it will spit out high quality smut until its context runs out.

My theory is that the internet is full of erotic text content, and that such content exhibits less variety between outputs than other kinds of written material. Together, this ensures that even an underpowered half assed llm will likely be capable of creating half decent porn texts assuming it is uncensored

Would love to see some sample outputs from this monstrosity (I love that somebody made this but I feel guilty consuming so much electricity to create nsfw lmao)