UPDATE: I forgot to mention that I used q8 of all models.
So I've had a lot of interest in the non-code 34b Finetunes, whether it's CodeLlama base or Yi base. From the old Samantha-34b and Synthia-34b to the new Dolphin-Yi and Nous-Capybara 34b models, I've been excited for each one because it fills a gap that needs filling.
My problem is that I can't seem to wrangle these fine-tunes into working right for me. I use Oobabooga (text-gen-ui), and always try to choose the correct instruction template either specified on the card or on TheBloke's page, but the models never seem to be happy with the result, and either get confused very easily or output odd gibberish from time to time.
For both Yi models, I am using the newest ggufs that TheBloke put out... yesterday? Give or take. But I've tried the past 2-3 different ggufs for the same model he's updated with when they came out.
The best luck I've had with the new Yi models was doing just plain chat mode with my AI Assistant's character prompt as the only thing being sent in, but even then both Yi fine-tunes that I tried eventually broke down after a few thousand context.
For example, after a bit of chattering with the models I tried a very simple little test on both: "Please write me two paragraphs. The content of the paragraphs is irrelevant, just please write two separate paragraphs about anything at all." I did that because previous versions of these two struggled to make a new line, so I just wanted to see what would happen. This absolutely confused the models, and the results were wild.
Has anyone had luck getting them to work? They appear to have so much potential, especially Nous Capybara which went toe to toe with GPT-4 in this benchmark, but I'm failing miserably at unlocking its full potential lol. If you have gotten it to work, could you please specify what settings/instructions you're using?
Try setting repetition penalty to 1.0, it helped a lot on base model before finetune. Are you limited to trying gguf only? There was an issue with llama.cpp that made it to So that BOS token was always inserted, but Yi works best without BOS token. Make sure that llama cpp version you have in oobabooga has this fixed or try running newest llama.cpp exe yourself. I get good results with my 2 private yi-34b qlora fine-tunes and with LoneStriker's spicyboros 3.1-2 (all exl2). I think it's better than llama 70b 2.4bpw... I didn't check dolphin or nous Capybara. To be honest I am not sure I was filling in the context to 4096 in any case, I think I kept it around 300-2000.