this post was submitted on 27 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Hey all! A friend and I have been building with open-source LLMs for a while now (originally for other project ideas) and found that quickly iterating with different fine-tuning datasets is super hard. Training a model, setting up some inference code to try out the model and then going back and forth took 90% of our time.

That’s why we built Haven, a service to quickly try out different fine-tuning datasets and base-models. Going from uploading a dataset to chatting with the resulting model now takes less than 5 minutes (using a reasonably sized dataset).

We fine-tune the models using low-rank adapters, which not only means that the changes made to the model are very small (only 30mb for a 7b parameter LLM), it also allows us to host many fine-tuned models very efficiently by hot swapping adapters on demand. This helped us reduce cold-start times to below one second. Research has shown that low-rank fine-tuning performance stays almost on-par with full fine-tuning.

We charge $0.004/1k training tokens. New accounts start with $5 in free credits so you can get started for free. You can export all the models to Huggingface.

Right now we support Llama-2 and Zephyr (which is itself a fine-tune of Mistral) as base-models. We’re gonna add some more soon. We hope you find this useful and we would love your feedback!

This is where to find it:
https://haven.run/

you are viewing a single comment's thread
view the rest of the comments
[–] kivathewolf@alien.top 1 points 11 months ago (1 children)

This is really cool! Good choice on starting with the chat model and not the base model. They are much more friendly to alignment with a small dataset. In your post you mention you do QLorA in few mins. I am assuming that’s for a small dataset like <1000 samples? What’s your backend running on? I would love to learn how you are deploying and scaling this for multiple customers. Best of luck!

[–] torque-mcclyde@alien.top 1 points 11 months ago

Yes, our datasets usually have a few hundred examples. We do support arbitrarily large datasets though, the fine-tuning just takes a little longer.

For deploying and scaling we're using Modal, it's a "serverless" GPU provider that we found to be very user-friendly.