Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[P] Unsupervised clustering of time series data (alien.top)

submitted 2 years ago by OctopusParrot@alien.top to c/machinelearning@academy.garden

6 comments fedilink hide all child comments

Hi everyone,

I have a project I've been working on for some time. I'm trying to see if it's possible to use ECG data (and just ECG data, no other inputs) to predict atrial fibrillation events with the help of machine learning. Ideally it would be great to be able to predict them at least 30-60 minutes ahead of time. Medical AI is tricky because both precision and recall need to be pretty high to actually be useful in a clinical setting, if the recall isn't high enough no one will trust the model to replace skilled nurse check-ins, and if the precision isn't high enough it will lead to too many unnecessary alarms and overburden staff.

I have a fairly robust dataset, a few hundred thousand hours of ECG data and around 3000 distinct A fib events. My intent is to take a time series segment (probably around 30-60 minutes), feed it to some kind of classifier, and then output a prediction of the likelihood that an event will occur. I've tried a number of different approaches and so far have had minimal success, with my top-performing classifier only giving about 60% balanced class accuracy. Obviously this is far below the threshold of utility.

I'm starting to wonder if I've been approaching the problem too simply - a number of different issues can cause Afib and lumping them all together as the "positive" class may dilute the signals I'm trying to detect. So I'm thinking perhaps I should see if the events cluster in ways that reflect the underlying physiological differences and then use a multiclass approach that predicts one of the causes instead.

I've been reading a bit and it seems like using KNN with a dynamic time warping metric might be a good way to do this but I have no experience using this time of unsupervised clustering approach. I'm also unclear how to deal with the fact that I don't actually know how many clusters there will be in the data, everything I've read so far suggests that you need to tell KNN first how many clusters there will be.

Any help would be appreciated!

you are viewing a single comment's thread
view the rest of the comments

[–] mcflyanddie@alien.top 1 points 2 years ago

As both a medical doctor and someone completing a PhD in ML applied to healthcare, I'm a little sceptical about the desired outcome. For paroxysmal atrial fibrillation (pAF), I really doubt you would get a clear signal from the available data predicting onset 30-60 minutes before the event itself - there are far too many complex, stochastic factors at play, and it is unlikely to have such a long lead time. At best, you might stratify into higher risk periods of devolving into AF.

But even if you succeeded perfectly, I'm not convinced that predicting onset of AF half an hour in advance is of much use to a clinician. I'm struggling to think of how I might use this information to alter management of my patient (as an emergency physician), just because of the nuances of current AF management. I only mention this because the task as described is already a very tricky one, and I worry the resulting value added might be disproportionately small relative to the effort expended... I could be wrong though, just my two cents!