Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 1 year ago

MODERATORS

communick@academy.garden

[D] Any text to speech AI model still being updated? (alien.top)

submitted 11 months ago by The--Nameless--One@alien.top to c/machinelearning@academy.garden

4 comments fedilink hide all child comments

weird question I suppose, I've tried Bark (works but it hasn't been updated in a while), Tortoise and Tortoise-Fast (both don't even install properly, and haven't been updated in a while too)...

Is there any Ai text to speech model still being updated?

you are viewing a single comment's thread
view the rest of the comments

[–] GinjaTurtles@alien.top 1 points 11 months ago

I’ve messed around with a bunch of open source AI TTS that I can self host. Here’s my 2 cents:

mrq has a repo where you can fine tune tortoise to audio samples you have using a GUI https://git.ecker.tech/mrq/ai-voice-cloning there’s some good YouTube videos by Jarrod’s Journey about this
if you want some of the best sounding local TTS, using finetuned tortoise + a finetuned RVC model is going to be very nice quality
recently the tortoise maintainer added HIFI GAN for even faster inference but i don’t think you can finetune this HIFI GAN model yet since it’s a custom implementation for tortoise
one of the models that I’m going to look into next that sounds incredible is google soundstorm. I believe a few people have implemented an open source pytorch soundstorm model on GitHub
I’m not sure how good a finetuned version of soundstorm would be but this what I’m going to try out next when I have time (work sucks)