ReDimNet2-B6 Core ML Speaker Embeddings

This directory contains a Core ML conversion of the ReDimNet2-B6 speaker embedding model from PalabraAI/redimnet2.

The model is used by software to assign deterministic speaker labels inside each audio file and prefix transcriptions with markers such as:

{SPEAKER_1} Добрий день.
{SPEAKER_2} Вітаю.

Files

ReDimNet2-B6.mlpackage/

Model Details

Source model: ReDimNet2-B6
Upstream repository: PalabraAI/redimnet2
Checkpoint: b6-vb2+vox2_v0-lm.pt
Task: speaker embedding extraction
Input: mono 16 kHz waveform
Output: L2-normalized speaker embedding
Core ML input name: audio
Core ML output name: embedding

The converted package expects a fixed waveform input of 160320 samples, about 10.02s at 16 kHz. The software pads shorter chunks and center-crops longer chunks before inference.

Convert

From the repository root:

uv run --with torch --with torchaudio --with scipy --with coremltools \
  scripts/convert_redimnet2_coreml.py \
  --output Models/speaker/ReDimNet2-B6.mlpackage

Notes

The model produces embeddings, not speaker IDs. The software performs per-file online cosine clustering over chunk embeddings. Speaker labels are deterministic within a source audio file, but SPEAKER_1 in one file is not the same person as SPEAKER_1 in another file.

Downloads last month: -