Model support per CrispASR — pure C++ inference with GGUF (no Python/NeMo needed)

#25

by cstr - opened 2 days ago

We've built a complete C++ runtime for Canary-1B-v2 in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no NeMo, no pip install.

What works:

Full pipeline: mel → FastConformer encoder → Transformer decoder
Native word-level timestamps
25 European languages with explicit source/target language control
Speech translation (X→en, en→X via --translate -sl de -tl en)
Streaming from mic/stdin (--stream, --mic, --live)
Speaker diarisation, language ID, SRT/VTT/JSON output
GPU acceleration via CUDA / Metal / Vulkan (ggml backends)
Punctuation toggle

Quick start:

git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8

# Auto-download and transcribe
./build/bin/crispasr --backend canary -m auto -f audio.wav

# German speech → English translation with SRT output
./build/bin/crispasr -m canary-1b-v2.gguf -f german.wav -sl de -tl en --translate -osrt

Pre-built GGUF: cstr/canary-1b-v2-GGUF

CrispASR supports 11 ASR backends in the same binary — Canary is the go-to for multilingual transcription and translation with explicit language control.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment