Ivan's picture

Ivan PRO

aufklarer

AI & ML interests

GenAI

Recent Activity

posted an update about 4 hours ago
Voice cloning models measured across five languages: OmniVoice, Chatterbox, VoxCPM2, Fish Audio I published a new Soniqo benchmark post for local voice cloning models across five languages: https://www.soniqo.audio/blog/voice-cloning-benchmarks Models: - OmniVoice int8 - Chatterbox Multilingual fp16 - VoxCPM2 bf16 - Fish Audio S2 Pro fp16 Languages: - English - German - Modern Standard Arabic - Spanish - Mandarin Chinese The benchmark uses Google FLEURS test clips as dataset references. Each row includes the reference audio, generated audio, speaker similarity, WER/CER, generated audio length, and RTF. Main result in this run: OmniVoice was the strongest all-around row set, with 0.707 mean speaker cosine across all five languages, 0.0% ASR error, and mean RTF 0.45. VoxCPM2 bf16 was especially strong on Arabic speaker match. Fish Audio S2 Pro showed strong German/Arabic similarity but slower RTF. Chatterbox Multilingual was competitive on Arabic and Spanish. This is an engineering benchmark, not a human MOS study. The speaker-similarity values should be compared within this table because every row uses the same local speaker-embedding pipeline. Try the stack locally with Speech Studio: https://www.soniqo.audio/speech-studio https://github.com/soniqo/speech-studio Underlying Swift library/CLI: https://github.com/soniqo/speech-swift Soniqo models and exports: https://huggingface.co/soniqo https://huggingface.co/aufklarer What model or language should I add next?
updated a dataset about 9 hours ago
aufklarer/central-bank-communications
updated a collection 1 day ago
MLX Speech Models
View all activity

Organizations

Soniqo Audio's profile picture