overview for OctopusParrot

[P] Unsupervised clustering of time series data in c/machinelearning@academy.garden

[–] OctopusParrot@alien.top 1 points 2 years ago

It might be. It could end up in that unfortunate area of "statistically significant but not clinically meaningful," meaning it might do what it's supposed to but not well enough to merit actually using it in the real world.

[P] Unsupervised clustering of time series data in c/machinelearning@academy.garden

[–] OctopusParrot@alien.top 1 points 2 years ago (2 children)

Thanks! I was kind of shooting in the dark on the size of the windows. I can try shortening but the sample rate is only one per minute, so I wanted to make sure there was enough signal for the model to extract useful data.

As for your second question, right now it's not really possible for a human to predict that far out - there's ECG instabilities that start to show up but usually only a few minutes before the event at most. My hypothesis was that subtle signals that might not be obvious to a human could be machine detectable, but it's entirely possible that isn't true.

1

[P] Unsupervised clustering of time series data (alien.top)

submitted 2 years ago by OctopusParrot@alien.top to c/machinelearning@academy.garden

6 comments fedilink

Hi everyone,

I have a project I've been working on for some time. I'm trying to see if it's possible to use ECG data (and just ECG data, no other inputs) to predict atrial fibrillation events with the help of machine learning. Ideally it would be great to be able to predict them at least 30-60 minutes ahead of time. Medical AI is tricky because both precision and recall need to be pretty high to actually be useful in a clinical setting, if the recall isn't high enough no one will trust the model to replace skilled nurse check-ins, and if the precision isn't high enough it will lead to too many unnecessary alarms and overburden staff.

I have a fairly robust dataset, a few hundred thousand hours of ECG data and around 3000 distinct A fib events. My intent is to take a time series segment (probably around 30-60 minutes), feed it to some kind of classifier, and then output a prediction of the likelihood that an event will occur. I've tried a number of different approaches and so far have had minimal success, with my top-performing classifier only giving about 60% balanced class accuracy. Obviously this is far below the threshold of utility.

I'm starting to wonder if I've been approaching the problem too simply - a number of different issues can cause Afib and lumping them all together as the "positive" class may dilute the signals I'm trying to detect. So I'm thinking perhaps I should see if the events cluster in ways that reflect the underlying physiological differences and then use a multiclass approach that predicts one of the causes instead.

I've been reading a bit and it seems like using KNN with a dynamic time warping metric might be a good way to do this but I have no experience using this time of unsupervised clustering approach. I'm also unclear how to deal with the fact that I don't actually know how many clusters there will be in the data, everything I've read so far suggests that you need to tell KNN first how many clusters there will be.

Any help would be appreciated!