this post was submitted on 29 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

when we benchmark different LLMs on different datasets (MMLU, TriviaQA, MATH, HellaSwag, etc.), what are the the signification of these scores? the accuracy? another metric? how can i know the metrics of each dataset (MMLU, etc.)

https://preview.redd.it/ri4trwbwsa3c1.png?width=2158&format=png&auto=webp&s=44b2569de2a3e56e5e66ae340921a69c820f03b2

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here