Maybe for audio data that have both sound and words? For example if you want to summarize a concert or sth
Maybe for audio data that have both sound and words? For example if you want to summarize a concert or sth