With the abundance of models, most developers and users have to select a small subset of available models for own evaluation, and that has to be based on some already available data about models' performance. At that stage, selecting models with, for example, highest MMLU score is one way to go about it.
shibe5
joined 10 months ago
I noticed this problem in llama.cpp too. I suspect that it may be because something is not implemented, that is required for Mistral models, e.g. sliding window attention. To confirm that, one can compare outputs from PyTorch with other software. I tried to do it, but PyTorch model runs out of system RAM with ~15k token prompt.
1026
1023
1020
That would include models trained on a calculator.
Own web UI for experimenting.