No, the old model does not have the training data. It only has "model weights". You can conceptualize those as the abstract rules that the old model learned when it read the training data. By design, they are not supposed to memorize their training data.
I expressed myself poorly, this is what I meant - it has the "essence" of the training data, but of course not the verbatim training data.
To outperform the old model, the new model needs more than what the old model learned. It needs primary sources, ie the training data itself. Which is going to be deleted.
I wonder how valuable in relative terms the old training data is to the process, compared to just the new training data. I can't answer it, but it would be interesting to know.
Charger