this post was submitted on 25 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] visarga@alien.top 1 points 11 months ago

for now we might be able to 10x our language data, but the top quality content has already been used

beyond that I think synthetic data will rule; it needs to be validated or filtered somehow; I think we need to use agents and RL to make it high quality