Life_Ask2806

joined 11 months ago
 

when we benchmark different LLMs on different datasets (MMLU, TriviaQA, MATH, HellaSwag, etc.), what are the the signification of these scores? the accuracy? another metric? how can i know the metrics of each dataset (MMLU, etc.)

https://preview.redd.it/5glmddnwsb3c1.png?width=2158&format=png&auto=webp&s=fcaf6e55d62445f3007380f06649455b29f8b2ec

 

when we benchmark different LLMs on different datasets (MMLU, TriviaQA, MATH, HellaSwag, etc.), what are the the signification of these scores? the accuracy? another metric? how can i know the metrics of each dataset (MMLU, etc.)

https://preview.redd.it/ri4trwbwsa3c1.png?width=2158&format=png&auto=webp&s=44b2569de2a3e56e5e66ae340921a69c820f03b2

 

hey, i am currently working on a research project and i am wondering how can i define my methodology and what approach should i follow in order to make LLMs more capable than the generalist ones, precisely in the domain of education, to assist learners in specific contexts, such as courses, lessons, or concepts, by providing them with personalized hints and guidance to help them improve their skills without directly giving them answers?

 

on the hugging face leaderboard, i was a bit surprised by the performance of falcon 180b.
do you have any explanation of how?
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

https://preview.redd.it/ofzw8xr6h51c1.png?width=1535&format=png&auto=webp&s=4835a3fb20dc6e725d5b0f9001f3a4e605f49b6d