Yi is not trustable on standard benchmarks because they are easy to game by including them in training data and the LKF gang who built this has a high pressure to justify their 1 billion dollar valuation and continue to milk investors.
The only way to really evaluate this is on some hidden benchmark never seen before and / or rigorous qualitative experiments.
Until then, I’m not holding my breath.