MambaSSM
vanitas
spoken-dialogue
flow-matching

Vanitas SFT Model

Supervised fine-tuned model for real-time spoken dialogue, trained on kyutai/DailyTalkContiguous.

Architecture

  • Perception Stream: Mamba-2 SSM (4 layers, d=256)
  • Cognition Core: Sparse Attention (4 layers, d=256)
  • Production Stream: Mamba-2 + Flow Matching (4 layers, d=256)

Training

  • Dataset: kyutai/DailyTalkContiguous (2,286 dialogues)
  • Epochs: 50
  • Batch Size: 16
  • Learning Rate: 2e-4
  • Hardware: NVIDIA A100 (Modal Cloud)

Files

  • best_model.pt — Checkpoint with the lowest validation loss
  • final_model.pt — Checkpoint after completing all 50 epochs
  • config.json — Model configuration
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train md13/vanitas-sft