Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] why falcon-180b ranking has dramatically decreased? (alien.top)

submitted 2 years ago by Life_Ask2806@alien.top to c/machinelearning@academy.garden

5 comments fedilink hide all child comments

on the hugging face leaderboard, i was a bit surprised by the performance of falcon 180b.
do you have any explanation of how?
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

https://preview.redd.it/ofzw8xr6h51c1.png?width=1535&format=png&auto=webp&s=4835a3fb20dc6e725d5b0f9001f3a4e605f49b6d

top 5 comments

sorted by: hot top controversial new old

[–] koolaidman123@alien.top 1 points 2 years ago

Public leaderboards mean nothing because 99% of the finetuned models are overfitted to hell, its like nobody ever did a kaggle comp before

[–] blackkettle@alien.top 1 points 2 years ago

I think a big obstacle is that it is so big hardly anyone is trying to fine tune it.

[–] vatsadev@alien.top 1 points 2 years ago

Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,

The 1b model was good, and is still good after several newer gens
the 7b was capable pre llama 2
40b and 180b were never as good

[–] detached-admin@alien.top 1 points 2 years ago

These leaderboards are dick measuring contests for small dicks. Imagine the dynamics of that.

[–] Unlucky-Attitude8832@alien.top 1 points 2 years ago

Falcon-180B is not good