Speaker-conditioned text-to-speech with emotion and energy control. Fine-tuned MOSS-TTS checkpoint featuring character voices.
Model: ZDisket/MOSS-TTS-PNY
Max 30s of audio output