Did you have to use RL? RL is pretty much just another word for gradient free optimization, which is obviously hard, but I guess that isn’t going to help you.
Did you have to use RL? RL is pretty much just another word for gradient free optimization, which is obviously hard, but I guess that isn’t going to help you.