this post was submitted on 10 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

So there detect pretrain data, https://swj0419.github.io/detect-pretrain.github.io/ , where one can test if a model has been pretrained on the text or not, so why dont we just test all the models going on the leaderboard, and just reject those detected for pretrain data? It would end the "train on test" issue

you are viewing a single comment's thread
view the rest of the comments
[–] mcmoose1900@alien.top 1 points 1 year ago

Ask on the huggingface leaderboard page!

The HF staff do seem to look at it, and have an interest in weeding out "contaminated" models (as they have already marked a few).