amphion/Emilia-Dataset
Viewer • Updated • 54.8M • 44.4k • 459
This repository hosts the official WavTTS checkpoint for zero-shot text-to-speech generation at 16 kHz. Please refer to the GitHub repository and paper for more details.
model_1200000.pt: official WavTTS checkpoint.vocab.txt: matching vocabulary file for the released checkpoint.Please use this checkpoint with the WavTTS codebase:
git clone https://github.com/cwx-worst-one/WavTTS
cd WavTTS
pip install -e .
Run inference with the default checkpoint:
wavtts_infer-cli \
--model WavTTS \
--ref_audio "path/to/reference.wav" \
--ref_text "The transcription of the reference audio." \
--gen_text "The text you want to synthesize."
The default WavTTS configuration downloads model_1200000.pt from this repository automatically. To use the files explicitly, set:
ckpt_file = "hf://worstchan/WavTTS/model_1200000.pt"
vocab_file = "infer/examples/vocab.txt"
The released model weights are licensed under CC BY-NC 4.0 due to the license restrictions of the Emilia training dataset. The WavTTS codebase is released under the MIT License.
If you find WavTTS useful, please cite the paper:
@article{chen2026wavtts,
title={WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling},
author={TODO},
journal={TODO},
year={2026}
}