SeaFill2025/Qwen3-4B-SFT

Qwen3-4B-SFT

Qwen3-4B-SFT is a reasoning-focused model fine-tuned from Qwen3-4B-Base using the verl framework.

In open-source practice, fully reproducible "pre-RL SFT base" releases are rare. This model fills that gap by providing a practical intermediate checkpoint that is Math-forward, Reasoning-focused, and Format-aligned. It is ideal as a clean, warm-start base for Reinforcement Learning (RL) or standalone reasoning tasks.

Configuration Notes

Model Info: Full-parameter SFT based on Qwen3-4B-Base, optimized for Chain of Thought (COT) reasoning.
Template: Trained with the Qwen chat template; learns to end responses with <|im_end|> (151645).
Suggested Configuration:
```
{
  "eos_token_id": 151645
}
```

You may adjust settings according to your training or deployment needs.

Benchmark Snapshot

The following results compare the Base (4B) model against the Qwen3-4B-SFT (this model), highlighting the performance gains in reasoning and mathematics.

Dataset	Base (4B)	Qwen3-4B-SFT (this model)	Improvement (Absolute)
AIME 2024	11.25%	20.8%	+9.55%
AIME 2025	6.46%	19.4%	+12.94%
AMC 2023	31.09%	58.0%	+26.91%
GPQA-Diamond	7.77%	29.1%	+21.33%

Cluster: MeluXina Supercomputer (LuxProvide)
Node Config: 4 NVIDIA-A100 GPUs per node.
Final SFT Run: 12 Node-hours (16× A100 for 3 hours)
Total R&D Investment: ~700 Node-hours (Includes data ablation, hyperparameter sweeps, and extensive benchmark evaluation.)

Project Links

Model repository (this page): https://huggingface.co/96kevinli29/Qwen3-4B-SFT
Dataset card used for SFT: https://huggingface.co/datasets/96kevinli29/SFT-Dataset
Training code repository: https://github.com/96kevinli29/base-model-sft-verl

Limitations

Not optimized for factual correctness in all domains
May still produce hallucinations or unsafe outputs
Performance is sensitive to prompt style and decoding settings

Citation

If you use this model, please cite this checkpoint, bibTeX for this release :

@misc{qwen3-4b-sft-2026,
  title        = {{Qwen3-4B-SFT}: Supervised Fine-Tuned {Qwen3}-4B for Reasoning},
  author       = {Hongyang Li and Xiao Li and {Sea-Fill Community}},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/SeaFill2025/Qwen3-4B-SFT}},
  note         = {Checkpoint trained with verl; warm-start for pre-RL alignment research. Maintained by Sea-Fill Community.}
}

Also cite as appropriate:

The base model (Qwen3-4B-Base) — use the official Qwen3 / Alibaba citation from its Hugging Face model card.
The training code repository: https://github.com/96kevinli29/base-model-sft-verl/
The original source datasets listed in the dataset recipe.

Downloads last month: 2,600

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for SeaFill2025/Qwen3-4B-SFT

Base model

Qwen/Qwen3-4B-Base

Finetuned

(257)

this model

Evaluation results

accuracy on AIME 2024
self-reported

20.800
accuracy on AIME 2025
self-reported

19.400
accuracy on AMC 2023
self-reported

58.000
accuracy on GPQA-Diamond
self-reported

29.100