Testing it now, but it's worse than 7b models on logic questions for me. Huge disappointment compared to Dolphin and Nous-Capybara, both Yi finetunes and are the best models I've tested so far. It just goes to show you how much difference finetuning a base model can make.

[–] drifter_VR@alien.top 1 points 2 years ago (1 children)

Nice, did you manage to make a difference between Dolphin and Nous-Capybara ? Bothe are pretty close to me

[–] YearZero@alien.top 1 points 2 years ago (1 children)

Nope they're both really good and very close to each other in my tests: https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit?usp=sharing&ouid=102314596465921370523&rtpof=true&sd=true

[–] drifter_VR@alien.top 1 points 2 years ago (1 children)

Thanks, I remember your tests, it's great you are still on it.So according to your tests, 34b models compete with GPT3.5. I am not too surprised. And Mistral-7b is not so far behind, what a beast !
Will you benchmark 70b models too ?

[–] YearZero@alien.top 1 points 2 years ago

Unfortunately I don't have enough ram/gpu, and too broke right now to afford paying for extra! But in the future I hope I will