hoppyJonas

joined 1 year ago
[–] hoppyJonas@alien.top 1 points 11 months ago

It's probably both. In the Chinchilla paper, they showed that for compute-optimal training, the model size and the training dataset size should be proportional.