Could you please share a citation for the mentioned research papers?
Last I looked into this, the hypothesis was that increasing parameter account results in a predictable increase in capability as long as training is correctly adapted.
https://arxiv.org/pdf/2206.07682.pdf
Very interested to see how these larger models that have plateaued are being trained!
Could you please share a citation for the mentioned research papers?
Last I looked into this, the hypothesis was that increasing parameter account results in a predictable increase in capability as long as training is correctly adapted.
https://arxiv.org/pdf/2206.07682.pdf
Very interested to see how these larger models that have plateaued are being trained!