While the benchmarks then to be cheated, especially by small models, I honestly think something is wrong with how you run it.
Yi-34B trades blows with Lllama 2 70B from my personal tests, making it do novel tasks invented by me, not the gamed benchmarks.
ALL 7B models are like putting a 7 year old vs an renowned professor when they are compared to 34B and 70B.