LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Any Easy and Local Way to Run Benchmarks? (alien.top)

submitted 2 years ago by xadiant@alien.top to c/localllama@poweruser.forum

7 comments fedilink hide all child comments

I want to see if some presets and custom modifications work well in benchmarks, but running HellaSwag or MMLU looks too complicated for me, and it takes 10+ hours to upload 20GBs of data.

I assume there isn't a convenient webui for chumps to run benchmarks with (apart from ooba perplexity, which I assume isn't the same thing?). Any advise?

you are viewing a single comment's thread
view the rest of the comments

[–] mattapperson@alien.top 1 points 2 years ago

It’s just a side project for now in my free time. Started building it for my own sanity. But it’s not really in any shape that someone could just jump right in and help. So unless you’re a VC willing to throw money at me to make it my full time job lol… probably a couple weeks?

My goal is to make it not just a tool to run evals, but to create a holistic build, test, use toolkit to do everything from:

Cleaning datasets
Generating synthetic training data from existing data and files
Creating LoRAs and full fine tunes
Prompt evaluation and automated iterations
Running evaluations/benchmarks.

Trying to do all that in a way that is appreciable and easy to use and understand for your average software engineer, not just ai scientists. This stuff should require the setup of 20 libraries, writing all the glue code, or require knowing Python.