shibe5

joined 10 months ago
[–] shibe5@alien.top 1 points 10 months ago

Own web UI for experimenting.

[–] shibe5@alien.top 1 points 10 months ago

With the abundance of models, most developers and users have to select a small subset of available models for own evaluation, and that has to be based on some already available data about models' performance. At that stage, selecting models with, for example, highest MMLU score is one way to go about it.

[–] shibe5@alien.top 1 points 10 months ago

I noticed this problem in llama.cpp too. I suspect that it may be because something is not implemented, that is required for Mistral models, e.g. sliding window attention. To confirm that, one can compare outputs from PyTorch with other software. I tried to do it, but PyTorch model runs out of system RAM with ~15k token prompt.

[–] shibe5@alien.top 1 points 10 months ago

1026

1023

1020

That would include models trained on a calculator.