this post was submitted on 19 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

I have a database which is essentially a survey tool where admins will define a survey and distribute this to a number of users who will then provide the answers to the survey questions.

The surveys are all independent of each other and fairly random...there's no real theme to them outside of the industry the survey tool is used by.

There are a fairly large number of questions/answers in the DB (in the millions). What would be an interesting ML exercise to run on the data for a complete ML novice (but competent coder)?

top 6 comments
sorted by: hot top controversial new old
[–] progressgang@alien.top 1 points 11 months ago

Make sure your ToS cover this forst

[–] Operation_Ivy@alien.top 1 points 11 months ago (1 children)

I would take it as an opportunity to practice your EDA first. I bet a ton of this data is straight garbage. Maybe experiment with ML powered ways of cleaning it up if you want to make the work more exciting.

Until you know the quality, GIGO.

[–] bl4h101bl4h@alien.top 1 points 11 months ago

Hi...I wouldn't call it garbage, but it is only relevant to the context for which it was created.

And as we are data processors, it needs no cleaning to speak of. The answers provided are appropriate to the needs of the survey creators.

[–] HPLaserJetM140we@alien.top 1 points 11 months ago (1 children)

I'd act within the constraints of my privacy policy, and open source it if able

[–] bl4h101bl4h@alien.top 1 points 11 months ago
[–] mphix@alien.top 1 points 11 months ago

Turn them into instructions and tune an LLM