By having a separate translation module, you're making the decision for the model about which parameters should be used for translation, and which for learning about the world.
With an extremely small model (one that doesn't have the capacity to even fully learn English), this would probably be reasonable. With any other size of model (100โ200 million parameters and up, maybe?), it would be far, far more effective to let the model pick and choose how it allocates its parameters.
Often, this will lead to a perfect meld of translation and learning, to the point that we don't currently even know how to figure out whether a given neuron or set of neurons does one task or another. The current most likely theory (in my opinion) is that most neurons are multitask.