
joined 10 months ago

Hey guys, just wondering if anyone has had success finetuning StyleTTS2 yet?

The only one I can find is the LJSpeech model, which sounds really good! But wondering what some other narrators / speakers would sound like, especially voices more outside the training dataset.

(Seems zero shot prompting at runtime gives low quality, so need real finetunes!)


Hey guys,

So TLDR is elevenlabs / play.ht is WAY too expensive for a realtime chat app, and we need an alternative. Guessing this is why character is rolling their own voice model, & obviously most apps can't do that, so what are the alternatives here?

I've read zero shot prompting for TTS (inserting a sample at runtime) is part of the reason elevenlabs / play is so expensive, wheras finetuning on individual voices like character / OAI did and hosting those as their own model would be way faster and cheaper.

But couqi seems really slow from our finetune testing, even on an h100, and not only that but it's not really... good. Does anyone know why, or there alternatives that chat apps are using? Is anyone working on better open source TTS? This seems totally overlooked compared to text where there's so much competition right now, but is almost just as important. Shocked more people aren't working on this! Thanks


Hey guys,

I heard mistral is releasing a model with 2x number of parameters as the open source one before the EOY, but is this one going to be behind their own api rather than open source? Or we're they talking about 'premium models' meaning even larger params. Really need this to be open source, thanks.