You might want to look at TVM (https://tvm.apache.org/), it's used in projects like MLC (https://github.com/mlc-ai/mlc-llm) to run inference on models in C++, and works great.
I'm not master degree level when it comes to coding but I have used solutions derived from that project for some experimental Unity plugins, which aren't really feasible with python.
What are you using for voices? Those sound great.