this post was submitted on 29 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
If you are interested in knowledge graphs, I did a whole bunch of research and work on fine-tuning Inkbot to create knowledge graphs. The structure returned is proper YAML, and I got much better results with my fine-tune than using GPT4.
https://huggingface.co/Tostino/Inkbot-13B-8k-0.2
Here is an example knowledge graph generated from an article about the Ukraine conflict: https://gist.github.com/Tostino/f6f19e88e39176452c1a765cb7c2caff
Great work! Would you mind sharing the datasets you used and/or how you augmented the data for training?
I'll give you some better examples, just didn't have time right then. Give me a few.
It was trained on a whole bunch of prompts asking for each task, so it's not reliant on the exact wording from one of them in training to work. Set the task in the meta section as "kg", and the model will respond with a knowledge graph if you ask for one (and sometimes if you don't).
Here are a few of them:
I haven't noticed a huge difference in the outcome at inference time depending on prompt used, but sprinkling in some more detailed instructions helped lower loss when training.
As far as dataset, I used a little bit of the Dolphin dataset, to not lose the usual conversational ability. A little bit of the SponsorBlock dataset as a seed, and then I improved it, and the rest is custom...I spent ~$1k or so on API calls creating it. I plan on releasing it at some point, but I want to improve some aspects of it first.
Total dataset size I used for training is ~85mb.