Model support per CrispASR — pure C++ inference with GGUF (no Python/NeMo needed)

#25
by cstr - opened

We've built a complete C++ runtime for Canary-1B-v2 in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no NeMo, no pip install.

What works:

  • Full pipeline: mel → FastConformer encoder → Transformer decoder
  • Native word-level timestamps
  • 25 European languages with explicit source/target language control
  • Speech translation (X→en, en→X via --translate -sl de -tl en)
  • Streaming from mic/stdin (--stream, --mic, --live)
  • Speaker diarisation, language ID, SRT/VTT/JSON output
  • GPU acceleration via CUDA / Metal / Vulkan (ggml backends)
  • Punctuation toggle

Quick start:

git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8

# Auto-download and transcribe
./build/bin/crispasr --backend canary -m auto -f audio.wav

# German speech → English translation with SRT output
./build/bin/crispasr -m canary-1b-v2.gguf -f german.wav -sl de -tl en --translate -osrt

Pre-built GGUF: cstr/canary-1b-v2-GGUF

CrispASR supports 11 ASR backends in the same binary — Canary is the go-to for multilingual transcription and translation with explicit language control.

Sign up or log in to comment