/r/localllama
Machine Learning
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
r/learnmachinelearning
r/LanguageTechnology
This is much easier than you think. Instead of retraining look at Retrieval Augmented Generation. This creates a database of your documents that can be queried for relevant passages. Then any requests plus relevant sources from your documents are sent to the LLM to formulate a response. You can use your own data; it provides source references; and can add new documents as required with zero retraining.
Using llamaindex or Langchain this requires < 50 lines of code. One line change to use a different LLM provider. Alternatively openai have launched GPTS which does it completely code free.
I think you're going to have a hard time completing this assignment. Just based on the fact you were unable to read the sub's rules. I don't think whatever we link you you'll be able to read the readme of.
This is a kind of RAG augmented project or llama project with embeddings . Not really complicated. have a look at langchain too, altough this may not be very efficient, that is a first approach.