Sounds like you want a mixture of experts for the first model, with the categorical distribution a function of the second model output. You can put this together straightforwardly in tf/pytorch/whatever, but will be lower level to implement (ie if you think there's a keras layer or something it is unlikely)
Sounds like you want a mixture of experts for the first model, with the categorical distribution a function of the second model output. You can put this together straightforwardly in tf/pytorch/whatever, but will be lower level to implement (ie if you think there's a keras layer or something it is unlikely)