overview for out_of

Proposed Alternative to Repetition Penalty - Noisy Sampling in c/localllama@poweruser.forum

[–] out_of_touch@alien.top 1 points 11 months ago

One question I have in regards to this stuff is if we improve the way we randomize the next token, does that increase the likelihood of the "thesaurus" problem occuring? I.e. where the model just keeps using more like "flowery" words because it doesn't want to keep reusing the same ones. I find that becomes a problem with a long enough context in a chat when using some of the other settings designed around avoiding repetition. Like sometimes my characters will start out talking normally and slowly progress into talking like college professors giving poetry lectures.

Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model in c/localllama@poweruser.forum

[–] out_of_touch@alien.top 1 points 11 months ago (4 children)

I'm curious what results you're seeing from the Yi models. I've been playing around with LoneStriker_Nous-Capybara-34B-5.0bpw-h6-exl2 and more recently LoneStriker_Capybara-Tess-Yi-34B-200K-DARE-Ties-5.0bpw-h6-exl2 and I'm finding them fairly good with the right settings. I found the Yi 34B models almost unusable due to repetition issues until I tried settings recommended in this discussion:

https://www.reddit.com/r/LocalLLaMA/comments/182iuj4/yi34b_models_repetition_issues/

I've found it much better since.

I tried out one of the neural models and found it couldn't keep track of details at all. I wonder if my setting weren't very good or something. I would have been using a EXL2 or GPTQ version though.

Any way to decrease inference time during long chats?(+decrease repetition without breaking things) in c/localllama@poweruser.forum

[–] out_of_touch@alien.top 1 points 11 months ago (2 children)

Interesting timing, I don't know if this exists yet or not but I was just thinking about a feature that would use like a range for context size.

The idea would be that you specify a min and a max context, say 6k and 8k and the way it would work is when you breach the 8k max, instead of just cutting it off there, it would go further forward and cut it off at 6k and then it would build on that context until it once again reached 8k and keep repeating the process after that. This would make it so that instead of reprocessing the entire context every time, it would only need to do it when the max was exceeded. I'm a programmer by trade so I'm kind of tempted to look into building this but I haven't even looked into what that requires or if the feature already exists out there somewhere.

Yi-34B Model(s) Repetition Issues in c/localllama@poweruser.forum

[–] out_of_touch@alien.top 1 points 11 months ago

I encounter this a lot with the Yi 34B models to the point where I've basically stopped using them for chat. I've tried a huge variety of settings, presets, quants, etc. I've used koboldcpp and text-generation-webui, I've used EXL2, GGML, and GPTQ. The issue appears consistently after the context grows past a certain size. Partial or entire messages will repeat. It will also get stuck where regenerating will always result in the same response unless drastic changes to settings are made and usually it just changes the message that it's stuck on. Smaller changes to the settings will just result it in changing the wording slightly of the stuck message.