It's probably both. In the Chinchilla paper, they showed that for compute-optimal training, the model size and the training dataset size should be proportional.
It's probably both. In the Chinchilla paper, they showed that for compute-optimal training, the model size and the training dataset size should be proportional.