Qwen3-4B-SFT

Qwen3-4B-SFT is a reasoning-focused model fine-tuned from Qwen3-4B-Base using the verl framework.

In open-source practice, fully reproducible "pre-RL SFT base" releases are rare. This model fills that gap by providing a practical intermediate checkpoint that is Math-forward, Reasoning-focused, and Format-aligned. It is ideal as a clean, warm-start base for Reinforcement Learning (RL) or standalone reasoning tasks.

Configuration Notes

  • Model Info: Full-parameter SFT based on Qwen3-4B-Base, optimized for Chain of Thought (COT) reasoning.
  • Template: Trained with the Qwen chat template; learns to end responses with <|im_end|> (151645).
  • Suggested Configuration:
    {
      "eos_token_id": 151645
    }
    

You may adjust settings according to your training or deployment needs.

Benchmark Snapshot

The following results compare the Base (4B) model against the Qwen3-4B-SFT (this model), highlighting the performance gains in reasoning and mathematics.

Dataset Base (4B) Qwen3-4B-SFT (this model) Improvement (Absolute)
AIME 2024 11.25% 20.8% +9.55%
AIME 2025 6.46% 19.4% +12.94%
AMC 2023 31.09% 58.0% +26.91%
GPQA-Diamond 7.77% 29.1% +21.33%
  • Cluster: MeluXina Supercomputer (LuxProvide)
  • Node Config: 4 NVIDIA-A100 GPUs per node.
  • Final SFT Run: 12 Node-hours (16× A100 for 3 hours)
  • Total R&D Investment: ~700 Node-hours (Includes data ablation, hyperparameter sweeps, and extensive benchmark evaluation.)

Project Links

Limitations

  • Not optimized for factual correctness in all domains
  • May still produce hallucinations or unsafe outputs
  • Performance is sensitive to prompt style and decoding settings

Citation

If you use this model, please cite this checkpoint, bibTeX for this release :

@misc{qwen3-4b-sft-2026,
  title        = {{Qwen3-4B-SFT}: Supervised Fine-Tuned {Qwen3}-4B for Reasoning},
  author       = {Hongyang Li and Xiao Li and {Sea-Fill Community}},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/SeaFill2025/Qwen3-4B-SFT}},
  note         = {Checkpoint trained with verl; warm-start for pre-RL alignment research. Maintained by Sea-Fill Community.}
}

Also cite as appropriate:

  • The base model (Qwen3-4B-Base) — use the official Qwen3 / Alibaba citation from its Hugging Face model card.
  • The training code repository: https://github.com/96kevinli29/base-model-sft-verl/
  • The original source datasets listed in the dataset recipe.
Downloads last month
2,600
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SeaFill2025/Qwen3-4B-SFT

Finetuned
(257)
this model

Evaluation results