Model support per CrispASR — pure C++ inference with GGUF (no Python/NeMo needed)
#25
by cstr - opened
We've built a complete C++ runtime for Canary-1B-v2 in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no NeMo, no pip install.
What works:
- Full pipeline: mel → FastConformer encoder → Transformer decoder
- Native word-level timestamps
- 25 European languages with explicit source/target language control
- Speech translation (X→en, en→X via
--translate -sl de -tl en) - Streaming from mic/stdin (
--stream,--mic,--live) - Speaker diarisation, language ID, SRT/VTT/JSON output
- GPU acceleration via CUDA / Metal / Vulkan (ggml backends)
- Punctuation toggle
Quick start:
git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8
# Auto-download and transcribe
./build/bin/crispasr --backend canary -m auto -f audio.wav
# German speech → English translation with SRT output
./build/bin/crispasr -m canary-1b-v2.gguf -f german.wav -sl de -tl en --translate -osrt
Pre-built GGUF: cstr/canary-1b-v2-GGUF
CrispASR supports 11 ASR backends in the same binary — Canary is the go-to for multilingual transcription and translation with explicit language control.