According to the scaling laws, the loss/error is approximated as
w0 + w1 * pow(num_params, -w2) + w3 * pow(num_tokens, -w4)
Bill wrote before that he'd been meeting with the OpenAI team since 2016, so he's probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing num_params
. In the limit, the corresponding term disappears, but the others do not.