this post was submitted on 26 Oct 2023
101 points (99.0% liked)

Linux

47369 readers
801 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
101
TTS voices that sound nice? (thorsten-voice.de)
submitted 11 months ago* (last edited 11 months ago) by Pantherina@feddit.de to c/linux@lemmy.ml
 

There are multiple screenreaders but their voices simply suck. KDEs espeak has way worse voices than Android eSpeak for some reason.

There is Thorsten Voice, a grest project that sounds really nice.

Not sure if he has a espeak dataset too.

Do you know any others?

Its essential for blind people, I dont think any blind person uses Linux if they have to hear these robot voices every day.

top 24 comments
sorted by: hot top controversial new old
[–] woelkchen@lemmy.world 13 points 11 months ago (2 children)

In addition to the very insightful reply by Ferk: Sadly most TTS development seems to be happening as online service these days. Google Neural TTS and Microsoft Azure TTS sound really great but require an online connection, an account, and possibly even paying (there's a threshold until it's free, then it costs almost nothing but almost nothing isn't free).

Btw, I don't know about the blind people you know but the ones I know use so insanle fast TTS output, the "sounds nice" aspect isn't really there in the first place. At least not to me.

[–] Starfighter@discuss.tchncs.de 9 points 11 months ago (1 children)

The development of Piper is being driven by the Home Assistant Project. That probably makes it one of the larger OSS TTS projects. Hope may not be lost yet ;)

[–] woelkchen@lemmy.world 1 points 11 months ago

Hope may not be lost yet ;)

And then we'll live in a TikTok TTS hellscape. 🤣

[–] Pantherina@feddit.de 1 points 11 months ago

Okay that is really interesting, so a TTS engine should be optimized to run very fast.

[–] interdimensionalmeme@lemmy.ml 7 points 11 months ago
[–] fhein@lemmy.world 6 points 11 months ago

A few days ago I wrote down a couple of links to interesting TTS projects that I was going to look into whenever I have time, along with some brief notes.

https://github.com/coqui-ai/TTS TTS + XTTS, GPU inference? 3GB model.

https://github.com/rhasspy/piper Low resource, CPU inference. 50MB model.

https://github.com/p0p4k/vits2_pytorch GPU inference? 500MB model. https://github.com/p0p4k/vits2_pytorch/discussions/27 Someone's models for vits2

[–] mesamunefire@lemmy.world 4 points 11 months ago (1 children)

I've recently made aware of the bark project. Especially the bark-ui project. It takes a long time to run but it does work. Sometimes it makes cursed stuff too: https://social.rootaccess.org/@michaelc/111277260439738652

[–] Pantherina@feddit.de 2 points 11 months ago (1 children)

Wow that is crazy! 10seconds sounds like an unnecessary flex though, wouldnt like 30min/all sounds be best?

[–] mesamunefire@lemmy.world 2 points 11 months ago (1 children)

I have a tiny laptop with the literal bare minimum to get this running haha. Your probably right but the models explode your memory pretty quick.

I did get some really good audio out of this model after a while. I threw the first chapter of the hobbit at it and it seemed to be doing ok. It's better than espeak and you only need to do it once to get audiobooks out.

[–] Pantherina@feddit.de 2 points 11 months ago* (last edited 11 months ago) (1 children)

I have to check that! But wait is that Windows only?

[–] mesamunefire@lemmy.world 2 points 11 months ago

I got it working on Ubuntu/PopOS.

[–] iopq@lemmy.world 3 points 11 months ago (3 children)
[–] schnurrito@discuss.tchncs.de 2 points 11 months ago

That says it is speech to text, not text to speech

[–] Pantherina@feddit.de 2 points 11 months ago (1 children)

Omg is that the data from the Common voice project? Nice!

[–] iopq@lemmy.world 1 points 11 months ago (1 children)

So it is, that's why some languages don't have good support yet - not enough recordings

[–] Pantherina@feddit.de 2 points 11 months ago

Also the last release is very old. Mozilla is weird, their pinned projects are often dead or outdated...

[–] grumpyrico@lemmy.world 1 points 10 months ago

Isn't speech to text the opposite of tts?

[–] TunaCowboy@lemmy.world 2 points 11 months ago

Not sure if this is gonna be much better than the alternatives you've listed, but you can try adjusting pitch, rate, range, etc. with spd-say.

[–] draeath@lemmy.sdf.org -2 points 11 months ago (2 children)

If you don't mind doing some development work, needing online connectivity, and paying for usage, AWS's Polly has some very good sounding TTS voices: https://aws.amazon.com/polly/

[–] Pantherina@feddit.de 2 points 11 months ago

Hm, as a backup that could be okay, but not working offline /without being a huge privacy problem...

[–] frostycakes@beehaw.org 1 points 11 months ago

Just please don't use one of the kid voices for technical videos, a la babywogue and their GNOME development videos.

[–] hungover_pilot@lemmy.world -5 points 11 months ago (2 children)

Samsung's TTS engine for android is the best I have found. I use it to listen to epub books.

[–] demesisx@infosec.pub 7 points 11 months ago
[–] Pantherina@feddit.de 1 points 10 months ago

Samsung sucks though... also softwarewise. Like, I debloated many phones and Samsung is crazy.

https://github.com/trytomakeyouprivate/Android-Tipps/tree/main/debloat

So their TTS will probly neither work offline, nor standalone