this post was submitted on 27 Oct 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

My boss is a semi famous author in a niche academic field. I have thousands of pages of text coming from books, transcripts, and more.

Is there a straightforward path to creating a corpus to augment Bert or Llama or another llm? End goal being able to chat with this ai that is now trained on his life's work.

Is there anything specific to understand in terms of preparing the corpus? Do I need key value pairs where I write a ton of examples questions and responses?

you are viewing a single comment's thread
view the rest of the comments
[–] Mammoth-Doughnut-160@alien.top 1 points 10 months ago

Agree that you should look at RAG. LLMs are not search engines so you need to connect the knowledge corpus to LLMs.

Try LLMWare's RAG implementation - it is easy to use, straightforward, and automates Mongo and Milvus set up so great for what you are trying to achieve. LLMWare also has free models in Hugging Face you can start to experiment with for experimenting for your use case.

https://github.com/llmware-ai/llmware

https://huggingface.co/llmware