this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

11 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
top 14 comments
sorted by: hot top controversial new old
[–] Dazzling_Ad1507@alien.top 1 points 2 years ago (1 children)

This model seems to be very broken, I attempted to also quantize it and I am getting divulges into nonsense or repeating words endlessly no matter the settings. :/

[–] candre23@alien.top 1 points 2 years ago (1 children)

All yi models are extremely picky when it comes to things like prompt format, end string, and rope parameters. You'll get gibberish from any of them unless you get everything set up just right, at which point they perform very well.

[–] BoshiAI@alien.top 1 points 2 years ago (2 children)

Thanks for confirming this. I've seen so much praise for these models, yet I've experienced no end of problems in trying to get decent, consistent output. A couple of Yi finetunes seem better than others, but there are still too many problems for me to prefer them over others (for RP/chat purposes.)

I'm still hopeful it's just a matter of time (and a fair amount of trial-and-error) before myself, app developers and model mixers, work out how to get fantastic, consistent out-of-the-box results.

[–] Desm0nt@alien.top 1 points 2 years ago

Hm. I just load gguf yi-34b-chat q4_k_m in oobabooga via llama.cpp with default params and 8k context and it's just work like a charm. Better (more lively language) than any 70b from openrouter (my local machine can't handle 70b)

[–] candre23@alien.top 1 points 2 years ago

It's a new foundational model, so some teething pains are to be expected. Yi is heavily based on (directly copied, for the most part) llama2, but there are just enough differences in the training parameters that default llama2 settings don't get good results. KCPP has already addressed the rope scaling, and I'm sure it's only a matter of time before the other issues are hashed out.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago (1 children)
[–] Kou181@alien.top 1 points 2 years ago (1 children)
[–] BalorNG@alien.top 1 points 2 years ago

EXTRERMINATE!

[–] llama_in_sunglasses@alien.top 1 points 2 years ago (2 children)

I made one too, but 34B Yi output is probably better. This model is worse at 2.9bpw compared to regular Tess-M at 4.6bpw and all of the usual Yi issues like repetition are worse. I uploaded it but I find it personally lacking. Also, uploading 50B+ models to HF is seriously a pain in the ass.

https://huggingface.co/lodrick-the-lafted/Kaiju-A-57B

[–] LeanderGem@alien.top 1 points 2 years ago (1 children)

How do you make the Yi models work for you? I find them super sub par so far.

[–] llama_in_sunglasses@alien.top 1 points 2 years ago

I use dolphin-yi because it listens the best of the Yi finetunes, but I find myself screwing around with the settings for Yi more than most. I pick a different preset and tweak it if it starts looping itself.

[–] hugganao@alien.top 1 points 2 years ago

how does merging work with what layers to choose from what models in the merging process?

[–] Sabin_Stargem@alien.top 1 points 2 years ago (1 children)

From the looks of it, the difference from GS and SG is the system prompt format and the order of the model merges. Guess I will go for GS, since it claims that any prompt format can be used. That one is Tess-Nous, the other is the opposite.

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

GS and SG merge different models.