its_just_andy

joined 10 months ago

Using simple tree-search techniques for LLM token sampling can give better results (andys.page)

submitted 9 months ago by its_just_andy@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods in c/localllama@poweruser.forum

[–] its_just_andy@alien.top 1 points 10 months ago (2 children)

if you're interested in running your own models for any reason, you really should build your own evaluation dataset for the scenarios you care about.

at this point, all the public benchmarks are such a mess. Do you really care if the model you select has the highest MMLU? Or, do you care only that it's the best-performing model for the scenarios you actually need?

permalink
fedilink
source