Very interesting topic. I have thought about this too. One idea that came to my mind was splitting your raw text into chunks, then ask a LLM to generate questions which the answers are these chunks and that way create an artificial dataset of QnA pairs. Of course the quality of the dataset relies on how well your structure your prompts to generate the questions.
Very interesting topic. I have thought about this too. One idea that came to my mind was splitting your raw text into chunks, then ask a LLM to generate questions which the answers are these chunks and that way create an artificial dataset of QnA pairs. Of course the quality of the dataset relies on how well your structure your prompts to generate the questions.