Ivan PRO

aufklarer

·

https://blog.ivan.digital

AI & ML interests

GenAI

Recent Activity

posted an update about 4 hours ago

Voice cloning models measured across five languages: OmniVoice, Chatterbox, VoxCPM2, Fish Audio I published a new Soniqo benchmark post for local voice cloning models across five languages: https://www.soniqo.audio/blog/voice-cloning-benchmarks Models: - OmniVoice int8 - Chatterbox Multilingual fp16 - VoxCPM2 bf16 - Fish Audio S2 Pro fp16 Languages: - English - German - Modern Standard Arabic - Spanish - Mandarin Chinese The benchmark uses Google FLEURS test clips as dataset references. Each row includes the reference audio, generated audio, speaker similarity, WER/CER, generated audio length, and RTF. Main result in this run: OmniVoice was the strongest all-around row set, with 0.707 mean speaker cosine across all five languages, 0.0% ASR error, and mean RTF 0.45. VoxCPM2 bf16 was especially strong on Arabic speaker match. Fish Audio S2 Pro showed strong German/Arabic similarity but slower RTF. Chatterbox Multilingual was competitive on Arabic and Spanish. This is an engineering benchmark, not a human MOS study. The speaker-similarity values should be compared within this table because every row uses the same local speaker-embedding pipeline. Try the stack locally with Speech Studio: https://www.soniqo.audio/speech-studio https://github.com/soniqo/speech-studio Underlying Swift library/CLI: https://github.com/soniqo/speech-swift Soniqo models and exports: https://huggingface.co/soniqo https://huggingface.co/aufklarer What model or language should I add next?

updated a dataset about 9 hours ago

aufklarer/central-bank-communications

updated a collection 1 day ago

MLX Speech Models

View all activity

Organizations

aufklarer 's models 112

aufklarer/Whisper-Large-v3-Turbo-CoreML

Automatic Speech Recognition • Updated about 19 hours ago

aufklarer/Fish-Audio-S2-Pro-MLX-fp16

Text-to-Speech • 5B • Updated 3 days ago • 33

aufklarer/Silero-VAD-v6.2.1-MLX

Audio Classification • 309k • Updated 4 days ago • 108

aufklarer/WavLM-Base-Plus-MLX-fp16

Feature Extraction • 94.4M • Updated 5 days ago • 40

aufklarer/CosyVoice3-0.5B-MLX-8bit

Text-to-Speech • Updated 5 days ago • 140

aufklarer/Silero-VAD-v6.2.1-CoreML

Voice Activity Detection • Updated 5 days ago • 78

aufklarer/Indic-Mio-MLX-fp16

Text-to-Speech • 0.6B • Updated 6 days ago • 70

aufklarer/Chatterbox-Multilingual-hi-MLX-fp16

Text-to-Speech • 0.6B • Updated 7 days ago • 45

aufklarer/gemma-4-E2B-it-MLX-2bit

Text Generation • 0.4B • Updated 7 days ago • 28

aufklarer/gemma-4-E2B-it-MLX-4bit

Text Generation • 0.7B • Updated 7 days ago • 52

aufklarer/gemma-4-E4B-it-MLX-2bit

Text Generation • 0.7B • Updated 7 days ago • 31

aufklarer/gemma-4-E4B-it-MLX-4bit

Text Generation • 1B • Updated 7 days ago • 24

aufklarer/OmniVoice-MLX-int8

Text-to-Speech • 0.2B • Updated 7 days ago • 71

aufklarer/Supertonic-3-CoreML

Text-to-Speech • Updated 8 days ago • 1.12k

aufklarer/Supertonic-3-CoreML-FP16

Text-to-Speech • Updated 8 days ago • 547

aufklarer/Qwen3-4B-Instruct-2507-MLX-5bit

Text Generation • 0.8B • Updated 8 days ago • 41

aufklarer/Qwen3-4B-Instruct-2507-MLX-4bit

Text Generation • 0.6B • Updated 8 days ago • 435

aufklarer/OmniVoice-MLX-fp16

Text-to-Speech • 0.6B • Updated 8 days ago • 29

aufklarer/Chatterbox-Multilingual-MLX-fp16

Text-to-Speech • 0.6B • Updated 9 days ago • 125 • 1

aufklarer/Qwen3-ForcedAligner-0.6B-CoreML-INT8

Audio Classification • Updated 9 days ago • 273 • 2

aufklarer/Qwen3-ForcedAligner-0.6B-CoreML-FP16

Audio Classification • Updated 9 days ago • 207

aufklarer/Qwen3-ASR-1.7B-MLX-5bit

1.0B • Updated 11 days ago • 23

aufklarer/Qwen3-ForcedAligner-0.6B-5bit

0.4B • Updated 11 days ago • 14

aufklarer/Qwen3-TTS-12Hz-0.6B-CustomVoice-MLX-4bit

0.4B • Updated 11 days ago • 246

aufklarer/CosyVoice3-0.5B-MLX-bf16

Text-to-Speech • Updated 11 days ago • 229

aufklarer/VoxCPM2-MLX-int8

Text-to-Speech • 0.8B • Updated 11 days ago • 434

aufklarer/VoxCPM2-MLX-bf16

Text-to-Speech • 2B • Updated 11 days ago • 177

aufklarer/Qwen3-TTS-12Hz-0.6B-Base-MLX-8bit

0.5B • Updated 11 days ago • 124

aufklarer/Qwen3-TTS-12Hz-1.7B-Base-MLX-8bit

0.8B • Updated 11 days ago • 135

aufklarer/Qwen3-TTS-12Hz-1.7B-Base-MLX-bf16

Text-to-Speech • 2B • Updated 11 days ago • 103