I've used KD and it has always performed better than training from scratch for me.
Could be a data related issue(scarcity, quality)? Or maybe you need to find good hyper-params.
To which domain are you applying it ?
I've used KD and it has always performed better than training from scratch for me.
Could be a data related issue(scarcity, quality)? Or maybe you need to find good hyper-params.
To which domain are you applying it ?
What are your losses/objectives? I'm inclined to say you need a huge amount of data for it as well.