this post was submitted on 18 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

on the hugging face leaderboard, i was a bit surprised by the performance of falcon 180b.
do you have any explanation of how?
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

https://preview.redd.it/ofzw8xr6h51c1.png?width=1535&format=png&auto=webp&s=4835a3fb20dc6e725d5b0f9001f3a4e605f49b6d

top 5 comments
sorted by: hot top controversial new old
[–] koolaidman123@alien.top 1 points 11 months ago

Public leaderboards mean nothing because 99% of the finetuned models are overfitted to hell, its like nobody ever did a kaggle comp before

[–] blackkettle@alien.top 1 points 11 months ago

I think a big obstacle is that it is so big hardly anyone is trying to fine tune it.

[–] vatsadev@alien.top 1 points 11 months ago

Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,

  • The 1b model was good, and is still good after several newer gens
  • the 7b was capable pre llama 2
  • 40b and 180b were never as good
[–] detached-admin@alien.top 1 points 11 months ago

These leaderboards are dick measuring contests for small dicks. Imagine the dynamics of that.

[–] Unlucky-Attitude8832@alien.top 1 points 11 months ago

Falcon-180B is not good