LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Why not test all models for training on the test data with Min-K% Prob? (alien.top)

submitted 2 years ago by vatsadev@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

So there detect pretrain data, https://swj0419.github.io/detect-pretrain.github.io/ , where one can test if a model has been pretrained on the text or not, so why dont we just test all the models going on the leaderboard, and just reject those detected for pretrain data? It would end the "train on test" issue

top 5 comments

sorted by: hot top controversial new old

[–] ambient_temp_xeno@alien.top 1 points 2 years ago (1 children)

It's all a rabbit hole of time wasting, imho. People judge x or y model on how well it works for their use cases.

[–] ninjasaid13@alien.top 1 points 2 years ago

It's all a rabbit hole of time wasting, imho. People judge x or y model on how well it works for their use cases.

Well people don't want to be falsely advertised on the capabilities of the model, if it's only good on certain use cases, then just say it.

[–] mcmoose1900@alien.top 1 points 2 years ago

Ask on the huggingface leaderboard page!

The HF staff do seem to look at it, and have an interest in weeding out "contaminated" models (as they have already marked a few).

[–] wind_dude@alien.top 1 points 2 years ago (1 children)

I will use this now for some tests.

[–] vatsadev@alien.top 1 points 2 years ago

Noice man