UPDATE: I forgot to mention that I used q8 of all models.
So I've had a lot of interest in the non-code 34b Finetunes, whether it's CodeLlama base or Yi base. From the old Samantha-34b and Synthia-34b to the new Dolphin-Yi and Nous-Capybara 34b models, I've been excited for each one because it fills a gap that needs filling.
My problem is that I can't seem to wrangle these fine-tunes into working right for me. I use Oobabooga (text-gen-ui), and always try to choose the correct instruction template either specified on the card or on TheBloke's page, but the models never seem to be happy with the result, and either get confused very easily or output odd gibberish from time to time.
For both Yi models, I am using the newest ggufs that TheBloke put out... yesterday? Give or take. But I've tried the past 2-3 different ggufs for the same model he's updated with when they came out.
The best luck I've had with the new Yi models was doing just plain chat mode with my AI Assistant's character prompt as the only thing being sent in, but even then both Yi fine-tunes that I tried eventually broke down after a few thousand context.
For example, after a bit of chattering with the models I tried a very simple little test on both: "Please write me two paragraphs. The content of the paragraphs is irrelevant, just please write two separate paragraphs about anything at all." I did that because previous versions of these two struggled to make a new line, so I just wanted to see what would happen. This absolutely confused the models, and the results were wild.
Has anyone had luck getting them to work? They appear to have so much potential, especially Nous Capybara which went toe to toe with GPT-4 in this benchmark, but I'm failing miserably at unlocking its full potential lol. If you have gotten it to work, could you please specify what settings/instructions you're using?
(i know this is not a roleplay question but anyway ^_^) Settings i use for Silly tavern and the Nous Capybara model. Works perfect so far, but you also need the character CFG globally enabled to 1.5 to make it stop looping. { "temp": 0.1, "temperature_last": true, "top_p": 1, "top_k": 25, "top_a": 0, "tfs": 1, "epsilon_cutoff": 0, "eta_cutoff": 0, "typical_p": 1, "min_p": 0.05, "rep_pen": 1, "rep_pen_range": 0, "no_repeat_ngram_size": 15, "penalty_alpha": 0, "num_beams": 1, "length_penalty": 1, "min_length": 0, "encoder_rep_pen": 1, "freq_pen": 0, "presence_pen": 0, "do_sample": true, "early_stopping": false, "add_bos_token": true, "truncation_length": 2048, "ban_eos_token": false, "skip_special_tokens": true, "streaming": true, "mirostat_mode": 2, "mirostat_tau": 2.55, "mirostat_eta": 0.1, "guidance_scale": 1, "negative_prompt": "", "grammar_string": "", "banned_tokens": "", "type": "ooba", "legacy_api": false, "rep_pen_size": 0, "genamt": 1024, "max_length": 16128 }
Might I ask what context and instruct template your using?