All weekend I've been wishing for a more streamlined fine-tuning experience. H
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
Glad to hear that we're not the only ones!
could you provide some directions on how to fine tune the model for coding? i have a ui framework in python that i would like to feed it the docs and some github repos code.
how would the dataset look like for that? should i be formulating different uses cases on the framework as if the user is asking?
in addition, do i need to provide standard python code or do those base modles have code in them already?
Interesting service, I'm definitely going to try it. I'd like to fine tune a 7B for function calling, and if possible, mimic openai's function description template so I can share them between model calls. I've experimented with injecting the function descriptions with a preamble to a user's prompt and it works ok (with Mistral 7B Instruct) but with many edge cases. I suspect I need to fine tune to get it to improve. How would I go about structuring my user prompts in the training dataset? Would something like this work?:
{"messages": [{"role": "system", "content": "You are a helpful navigation assistant that calls the appropriate function base on a user's input."}, {"role": "user", "content": "Go to Paris, France"}, {"role": "assistant", "content": "{"lat": 48.856667, "lng":2.352222}]}
Why not just use grammar sampling with Llama cpp?
Is it possible to do this in a way that allows the model to choose whether to write normal text or to call one or more functions?
Well, you don't have to have it ever write "normal" text. You can just have an object with a "text" property that the model is instructed to use only when it is not calling a function. Otherwise, it can provide different function calling json.
A grammar means it's forced to output a structure, in this case, json. You can write instructions to output different json based on different scenarios and use code to check which key is present in the json. If the object has the key "text" its a text response. If it doesn't its a function response.
That's basically how the function call api works anyway, just less consistent than grammar.
This is really cool! Good choice on starting with the chat model and not the base model. They are much more friendly to alignment with a small dataset. In your post you mention you do QLorA in few mins. I am assuming that’s for a small dataset like <1000 samples? What’s your backend running on? I would love to learn how you are deploying and scaling this for multiple customers. Best of luck!
Yes, our datasets usually have a few hundred examples. We do support arbitrarily large datasets though, the fine-tuning just takes a little longer.
For deploying and scaling we're using Modal, it's a "serverless" GPU provider that we found to be very user-friendly.
This is an amazing sub with amazingly talented individuals. I love it here. This is great.
This means a lot! Thank you.
hey, can I do the fine-tuning on my own computer or only in your cloud?
Fine-tuning is online. You can download the weights and run them wherever (including your own computer).