As a note, I tested to process 2 samples with adobe's podcast enhance tool and it was very effective in removing the slight metallic artifacts
LocalLLaMA
Community to discuss about Llama, the family of large language models created by Meta AI.
It would be the future if it supported european languages
Goddammit, I just fine-tuned Tortoise with custom voice. Can't wait for webui's for the StyleTTS. Hope it's easy to fine-tune
Yep it is, takes around 4 hours on a 3090.
That's acceptable. Did you full train or fine-tune though? And how much data?
Fine tune and around an hour worth of data.
How do you Fine-Tune or full train? I wish there was a step by step guide, I've been trying for hours but I can't figure out what I'm supposed to do. The Readme doesn't explain much.
Wow!
How fast is the generation? Can it be used real-time?
Very fast, RTF of below 0.1 so processing time is 10x faster than spoken time.
On cpu btw.