This is much easier than you think. Instead of retraining look at Retrieval Augmented Generation. This creates a database of your documents that can be queried for relevant passages. Then any requests plus relevant sources from your documents are sent to the LLM to formulate a response. You can use your own data; it provides source references; and can add new documents as required with zero retraining.
Using llamaindex or Langchain this requires < 50 lines of code. One line change to use a different LLM provider. Alternatively openai have launched GPTS which does it completely code free.
This is much easier than you think. Instead of retraining look at Retrieval Augmented Generation. This creates a database of your documents that can be queried for relevant passages. Then any requests plus relevant sources from your documents are sent to the LLM to formulate a response. You can use your own data; it provides source references; and can add new documents as required with zero retraining.
Using llamaindex or Langchain this requires < 50 lines of code. One line change to use a different LLM provider. Alternatively openai have launched GPTS which does it completely code free.