How it works: Record audio from your microphone or upload a reference audio file (any format/sample rate), use ASR to auto-fill the reference text (or type manually), then enter the text you want to generate in that voice. Audio will be automatically preprocessed to 24kHz mono.
Processing audio and cloning voice... This may take a few minutes.
â Voice Cloning Complete!
Generated Text: