this post was submitted on 26 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

I have a collection of audio files from comedy skits, and I’m looking to train a neural network to autonomously decide when to trigger a “laughing” sound effect. The catch? I want to avoid manually setting cue points for laughter. Instead, I’m aiming for the neural network to determine the right moments to insert laughter, based on the content of the skit.

you are viewing a single comment's thread
view the rest of the comments
[–] farmingvillein@alien.top 1 points 10 months ago

Sounds stupid and reductionist, but I'd start with doing speech-to-text and then run a small # of examples through 3.5-turbo & GPT-4, asking it to annotate where a laugh track should be added.

Good chance that it'll do a pretty decent job, with some careful prompting.

Then, based on cost requirements, you can try collecting some labels and fine-tuning a model like Mistral (which you could also just try upfront as well).