LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Optimizing Model Settings for Sillytavern: Seeking Guidance on Goliath 120b and Xwin 70b (alien.top)

submitted 1 year ago by Regular_Instruction@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

I've been encountering a repetition issue with models like Goliath 120b and Xwin 70b on Sillytavern + OpenRouter.While I understand that changing models can have a significant impact, I'm puzzled by the repetition problem.Despite my efforts to find online resources for correct settings, my searches for Auroboros 70b, Xwin 70b, Lzlb 70b, and others have been in vain.

I came across posts on this subreddit addressing similar concerns, but unfortunately, they lacked solutions.One suggestion was to "use the shortwave preset," but it seems to be nonexistent.Unsure of what I might be overlooking, I'm reaching out here for help.The 120b model should theoretically outperform the 7b/13b models, but I suspect there's a configuration issue.

If anyone could provide insights or share the correct settings for these models, it would greatly help not only me but also future users facing the same issue.Let's compile a comprehensive guide here so that anyone searching the internet for a solution can find this post and get the answers they need.Thank you in advance for your assistance!

PS: mythomax 13B seems to be the best model because it's the only one that actually works...

you are viewing a single comment's thread
view the rest of the comments

[–] vacationcelebration@alien.top 1 points 1 year ago

I run my models locally only, and my best experience has been using mirostat to combat repetition and samey regenerations. Before that I used contrastive search with some success.

I have to say though, I'm not sure if mirostat would be a good solution through Openrouter. Doesn't it have like its own little cache or something (referring to mirostat)? Definitely seems like it's caching generated tokens somehow and tries to avoid them in the future, or something like that.

Anyways, for 70b xwin & lzlv, my settings have been simple: everything on default values (1 or 0), mirostat mode=2, tau=2-3,ETA=1. This gets me great responses, zero repetition, high variety when regenerating, and not too many hallucinations. These settings seem pretty stable. I sometimes tweak tau or raise/lower the temp, but eventually always end up at those settings again.

But e.g. for the new 34b Yi fine-tunes, these settings don't work. It's like I'm back in the early days of llama2, showing the problems you mentioned: The models start to loop and repeat, and not just in the same response, but repeat previous responses verbatim as well, reuse the same phrases again and again, don't know when to stop, etc. For those, I haven't found good, stable settings so far no matter what I change (they seem to prefer low temp though), which is so frustrating, as they are great otherwise. So mirostat is not a magic bullet it seems.

Can't say anything about goliath unfortunately (haven't used it).