Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[P] Obscure speech recording while keeping data useful for machine learning (alien.top)

submitted 2 years ago by vladisser@alien.top to c/machinelearning@academy.garden

6 comments fedilink hide all child comments

Is there a way to obscure speech recording so there is no way to play it and get something intelligible, but still keep it useful for machine learning? For my project I have to collect data in uncontrolled environment, and I would like to do it without accidentally storing sensitive information.

It seems to be an uncommon problem, and I haven't found much. I am currently using spectrograms to extract features. For what I have found, making a spectrogram from a soundwave uses STFT and doesn't store phase information, so there is not enough information to perform the inverse transformation. Do I understand this correctly? What are other ways to do it?

top 6 comments

sorted by: hot top controversial new old

[–] RedwoodsCool@alien.top 1 points 2 years ago (1 children)

Is there a way to obscure speech recording so there is no way to play it and get something intelligible, but still keep it useful for machine learning?

Depends how you define "obscure", "useful", etc.

Maybe feed it directly to your machine then trash it? The info will still be vulnerable when in transit or in memory though.

[–] vladisser@alien.top 1 points 2 years ago (1 children)

I would like to store data for experiments, unfortunately just using and momentarily discarding it defeats the reason for collecting it.

[–] RedwoodsCool@alien.top 1 points 2 years ago

The only data transformation which wouldn't be trivial to reverse is encryption. But you still need to trust yourself, the machine, the network, and everything in between to not leak the key or the data.

If you "obscure" the data enough, then it won't be useful. There's no solution to your problem as far as I can tell.

[–] ginger_turmeric@alien.top 1 points 2 years ago

maybe define some audio noising function. Then apply the noising function to your training data, and train your network to output the denoised version?

[–] TotesMessenger@alien.top 1 points 2 years ago

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/datascienceproject] Obscure speech recording while keeping data useful for machine learning (r/MachineLearning)

^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)

[–] Bubbly-Experience513@alien.top 1 points 2 years ago

Trying to avoid getting deep faked or why, explicitly, and what do you want the ML to produce? Interesting question btw