Algorithmic SFT vs Distillation
Collection
10 LoRA adapters + 6 datasets. Algo template SFT vs QwQ distillation on Qwen2.5-1.5B-Instruct across 4 reasoning domains. โข 16 items โข Updated
LoRA adapter for Qwen/Qwen2.5-1.5B-Instruct fine-tuned on long arithmetic via QwQ-32B Distillation.
Part of the Algorithmic SFT vs Distillation experiment studying whether deterministic algorithmic templates teach procedural reasoning more effectively than distillation from large reasoning models.
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Method | QwQ-32B Distillation |
| Framework | LLaMA-Factory (SFT stage) |
| LoRA rank | 64 |
| LoRA target | all linear layers |
| Learning rate | 1e-4 |
| Epochs | 3 |
| Batch size | 1 (grad accum 16) |
| Cutoff length | 32,768 tokens |
| Training data | 5,000 QwQ-32B reasoning traces (d4, filtered). Teacher solve rate: 43.8% |
| Split | Accuracy |
|---|---|
| Test (in-distribution) | 90.6% |
| Harder variant | 8.4% |
| Structural OOD | 6.8% |
Nearly tied with algo SFT in-distribution (90.6% vs 92.6%). Slight OOD edge (6.8% vs 0%) but both effectively fail.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "reasoning-degeneration-dev/algo-sft-long-arithmetic-distill-qwq")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")