Your teacher's argument is based on the fact that pseudo-random generators are deterministic, which is entirely irrelevant to ML theory.
If you want to make the point that the "can only be" part is extremely far-reaching, just bring quantum physicists in the discussion.
Am I correct to say that "grokking" is apparently an effect of regularization, as in reaching good generalization performance from pushing the weights to be as small as possible until the model reaches a capacity that is smaller than the dataset?