MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations Paper • 2510.10396 • Published Oct 12, 2025
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion Paper • 2507.14534 • Published Jul 19, 2025
Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly Paper • 2505.00426 • Published May 1, 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis Paper • 2505.14910 • Published May 20, 2025 • 1
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation Paper • 2507.06670 • Published Jul 9, 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching Paper • 2502.12572 • Published Feb 18, 2025 • 2
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching Paper • 2502.12572 • Published Feb 18, 2025 • 2
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper • 2502.18924 • Published Feb 26, 2025 • 16
Versatile Framework for Song Generation with Prompt-based Control Paper • 2504.19062 • Published Apr 27, 2025 • 6