Technology

86697 readers

4670 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

108

Common Voice - Donate your voice to teach machines how people speak | Mozilla (commonvoice.mozilla.org)

submitted 2 years ago* (last edited 2 years ago) by otter@lemmy.ca to c/technology@lemmy.world

16 comments fedilink hide all child comments

It's frustrating when you're not understood — especially when you're trying to speak to Siri, Alexa, or another internet-connected device.

Voice datasets that power voice recognition services are owned by a handful of major companies, and they can wildly underrepresent the voices of non-dominant accents, Black, Indigenous, and other people of color, disabled people and gender marginalised people. In fact, for people speaking other global languages - there may be no datasets at all.

That’s why Mozilla launched Common Voice — the world's largest public voice database, powered by the voices of volunteer contributors. Our goal is to teach machines how real people speak.

Today, we’re asking you to contribute to Common Voice, but we want you to choose how you’ll do it. Will you donate your voice to one of our Common Voice language datasets? Or will you make a $34 donation to Mozilla to support projects like this to reclaim the internet? (Or both!)

I'd be curious about the privacy concerns, but this might help a lot with underrepresented voice data. It might come down to if someone wants more datasets for their particular voice/language more than the other concerns.

If your language/accent is already well documented, it might not help as much?

you are viewing a single comment's thread
view the rest of the comments

[–] yo_scottie_oh@lemmy.ml 38 points 2 years ago* (last edited 2 years ago) (5 children)

The data set is available under the Mozilla Public License v2 through the Common Voice GitHub page. I’m not sure if I’m reading the terms of the license correctly, but I believe it allows commercial use.

[–] otter@lemmy.ca 19 points 2 years ago* (last edited 2 years ago)

I think that might be a part of the focus, to push companies into including these underrepresented languages/accents so that the products work for everyone instead of a smaller subset

Worth considering before contributing

[–] gedaliyah@lemmy.world 6 points 2 years ago

I've been using this CV Project app for months whenever I have a few minutes.

[–] Nonononoki@lemmy.world 4 points 2 years ago

Every free and open-source license allows commercial use.

[–] FooBarrington@lemmy.world 3 points 2 years ago

I used the data set for my bachelor thesis. Thank you, Mozilla!