this post was submitted on 29 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

when we benchmark different LLMs on different datasets (MMLU, TriviaQA, MATH, HellaSwag, etc.), what are the the signification of these scores? the accuracy? another metric? how can i know the metrics of each dataset (MMLU, etc.)

https://preview.redd.it/ri4trwbwsa3c1.png?width=2158&format=png&auto=webp&s=44b2569de2a3e56e5e66ae340921a69c820f03b2

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here