apssg96

joined 11 months ago

Point me towards some basic dataset preparation tips for LLM's? in c/localllama@poweruser.forum

[–] apssg96@alien.top 1 points 11 months ago

Very interesting topic. I have thought about this too. One idea that came to my mind was splitting your raw text into chunks, then ask a LLM to generate questions which the answers are these chunks and that way create an artificial dataset of QnA pairs. Of course the quality of the dataset relies on how well your structure your prompts to generate the questions.

permalink
fedilink
source