They are all known to be stable, because they have a ground-truth simulator to test with. Stable doesn't necessarily mean useful, but that wasn't the point.
The benefit here is that training a neural network on simulator data allows you to generate instead of search. The simulator is very computationally expensive (even compared to a deep neural network) and the search space is large and high-dimensional.
Model based RL is looking a little more stable in the last year. Dreamerv3 and TD-MPC2 claim to be able to train on hundreds of tasks with no per-task hyperparameter tuning, and report smooth loss curves that scale predictably.
Have to wait and see if it pans out though.