this post was submitted on 28 Oct 2024
238 points (99.2% liked)

Technology

59323 readers
4559 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] RobotToaster@mander.xyz 5 points 2 weeks ago (3 children)

How can it be that bad?

I've used zoom's ai transcriptions, for far less mission critical stuff, and it's generally fine, (I still wouldn't trust it for medical purposes)

[–] huginn@feddit.it 27 points 2 weeks ago

Zoom ai transcriptions also make things up.

That's the point. They're hallucination engines. They pattern match and fill holes by design. It doesn't matter if the match isn't perfect, it will patch it over with nonsense instead.

[–] Grimy@lemmy.world 16 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

Whisper has been known to hallucinate during long moments of silence. Most of their examples though are most likely due to bad audio quality.

I use whisper quite a bit and it will fumble a word here or there but never to the extent that is being shown in the article.

[–] QuadratureSurfer@lemmy.world 7 points 2 weeks ago

Same, I'd say it's way better than most other transcription tools I've used, but it does need to be monitored to catch when it starts going off the rails.

[–] brbposting@sh.itjust.works 2 points 2 weeks ago

Thanks for watching!

[–] ElPussyKangaroo@lemmy.world 9 points 2 weeks ago

It's not the transcripts that are the issue here. It's that the transcripts are being interpreted by the model to give information.