this post was submitted on 14 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] its_just_andy@alien.top 1 points 10 months ago (2 children)

if you're interested in running your own models for any reason, you really should build your own evaluation dataset for the scenarios you care about.

at this point, all the public benchmarks are such a mess. Do you really care if the model you select has the highest MMLU? Or, do you care only that it's the best-performing model for the scenarios you actually need?

[–] Exios-@alien.top 1 points 10 months ago

This seems to me at least like the most logical conclusion. I’m currently working on developing some level of moral/ethical dilemma scenarios to interpret different perspectives and response strategies, for my personal use cases of discussion and breaking down topics into manageable levels and then exploring the nuances, it is very effective. Seems to be far too broad of a “use case” to define one set of benchmarks unless it’s incredibly comprehensive and refined over and over as trends develop

load more comments (1 replies)