this post was submitted on 20 Nov 2023
1 points (100.0% liked)
LocalLLaMA
1 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I was struggling at first and had that American twang coming through...
But I managed to get a very clear, short clip of an English actor from an interview. There was no background noises, it was very clear. I made sure to clip out any non speech from the start or end of the audio, then saved it as a 22050HZ mono 16bit wav.
That seems to have done it! I get a pretty good representation of the voice and it 99% seems to stay in character with the occasional slight slip.
I also occasionally get a little gibberish, which seems to be when my model is trying to say somehthing like " ' " (which occasionally slips through when its generating text and I look at the backend of whats being sent for audio processing). Im guessing its possible to filter this out with a regex or something.