ReDimNet2-B6 Core ML Speaker Embeddings
This directory contains a Core ML conversion of the ReDimNet2-B6 speaker embedding model from PalabraAI/redimnet2.
The model is used by software to assign deterministic speaker labels inside each audio file and prefix transcriptions with markers such as:
{SPEAKER_1} Добрий день.
{SPEAKER_2} Вітаю.
Files
ReDimNet2-B6.mlpackage/
Model Details
- Source model: ReDimNet2-B6
- Upstream repository:
PalabraAI/redimnet2 - Checkpoint:
b6-vb2+vox2_v0-lm.pt - Task: speaker embedding extraction
- Input: mono 16 kHz waveform
- Output: L2-normalized speaker embedding
- Core ML input name:
audio - Core ML output name:
embedding
The converted package expects a fixed waveform input of 160320 samples, about 10.02s at 16 kHz. The software pads shorter chunks and center-crops longer chunks before inference.
Convert
From the repository root:
uv run --with torch --with torchaudio --with scipy --with coremltools \
scripts/convert_redimnet2_coreml.py \
--output Models/speaker/ReDimNet2-B6.mlpackage
Notes
The model produces embeddings, not speaker IDs. The software performs per-file online cosine clustering over chunk embeddings. Speaker labels are deterministic within a source audio file, but SPEAKER_1 in one file is not the same person as SPEAKER_1 in another file.
- Downloads last month
- -