LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

in the context of evaluating LLMs, what do these scores technically mean? (alien.top)

submitted 2 years ago by Life_Ask2806@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

when we benchmark different LLMs on different datasets (MMLU, TriviaQA, MATH, HellaSwag, etc.), what are the the signification of these scores? the accuracy? another metric? how can i know the metrics of each dataset (MMLU, etc.)

https://preview.redd.it/5glmddnwsb3c1.png?width=2158&format=png&auto=webp&s=fcaf6e55d62445f3007380f06649455b29f8b2ec

you are viewing a single comment's thread
view the rest of the comments

[–] ThisGonBHard@alien.top 1 points 2 years ago

Nothing, sadly.

Models are trained on the questions, to improve performance, making the tests moot.