LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Tool to quickly iterate when fine-tuning open-source LLMs (alien.top)

submitted 2 years ago by torque-mcclyde@alien.top to c/localllama@poweruser.forum

13 comments fedilink hide all child comments

Hey all! A friend and I have been building with open-source LLMs for a while now (originally for other project ideas) and found that quickly iterating with different fine-tuning datasets is super hard. Training a model, setting up some inference code to try out the model and then going back and forth took 90% of our time.

That’s why we built Haven, a service to quickly try out different fine-tuning datasets and base-models. Going from uploading a dataset to chatting with the resulting model now takes less than 5 minutes (using a reasonably sized dataset).

We fine-tune the models using low-rank adapters, which not only means that the changes made to the model are very small (only 30mb for a 7b parameter LLM), it also allows us to host many fine-tuned models very efficiently by hot swapping adapters on demand. This helped us reduce cold-start times to below one second. Research has shown that low-rank fine-tuning performance stays almost on-par with full fine-tuning.

We charge $0.004/1k training tokens. New accounts start with $5 in free credits so you can get started for free. You can export all the models to Huggingface.

Right now we support Llama-2 and Zephyr (which is itself a fine-tune of Mistral) as base-models. We’re gonna add some more soon. We hope you find this useful and we would love your feedback!

This is where to find it:
https://haven.run/

you are viewing a single comment's thread
view the rest of the comments

[–] kivathewolf@alien.top 1 points 2 years ago (1 children)

This is really cool! Good choice on starting with the chat model and not the base model. They are much more friendly to alignment with a small dataset. In your post you mention you do QLorA in few mins. I am assuming that’s for a small dataset like <1000 samples? What’s your backend running on? I would love to learn how you are deploying and scaling this for multiple customers. Best of luck!

[–] torque-mcclyde@alien.top 1 points 2 years ago

Yes, our datasets usually have a few hundred examples. We do support arbitrarily large datasets though, the fine-tuning just takes a little longer.

For deploying and scaling we're using Modal, it's a "serverless" GPU provider that we found to be very user-friendly.