Really nice, I had a dreamz we need to find a way to iterate over base models so every finetune is closer to sota :D
kpodkanowicz
joined 2 years ago
Great work as always! Regarding Exl2 its sensitive to calibration dataset - probably the one that was used is not related to your tests. I.e. you can get higher scores in HumanEval even in 3 bits that you would get in transformers 8bit. I hope that this standard will get more popular and finetuners will do their own measurement file/quants using their dataset. Never seen q2 gguf doing better than exl2 unless i mixed rope config.
Edit - for anything higher than 4.25bit i usually use 8bit head
amazing!!! I bet this approach (and optionally lora routers)will be our only shot to beat gpt4 and beyond.
hmm, one of the really interesting details here - normal lora in rank 8 tested better than in rank 128 - genuine question - how is it possible? medicore data used for lora? I have done few finetunes recently and see a similar situation between rank 128 and 256