this post was submitted on 20 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
When I look at the leaderboard, I mostly pay attention to TruthfulQA, as it seems most predictive of models which are good for my use-case. YMMV of course.
Once I've downloaded a model, I'll fiddle around with different llama.cpp parameters and prompt templates, figuring out what works best for it, and then send it through my test framework, which has it infer five times on each of several prompts.
Evaluation of test results are fairly subjective, but there are some obvious problems which recur, like not inferring an answer, or inferring itself a new user prompt to answer.
I just finished a compare-and-contrast of Marx-3B vs Marx-3B-v3 using that test framework, which you can see (along with raw test results) here: https://old.reddit.com/r/LocalLLaMA/comments/17xsliz/marx_3b_v3_and_akins_3b_gguf_quantizations/ka2fd19/
I've been meaning to add some simple assessment logic to my test framework, which tries to guess at the quality of inferred replies, but haven't made it a priority.
What are the top 3 best open source LLMs in your opinion?