Qwen3.5-9B-Sculpt-Throughput
12% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.
Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.
This is the Throughput tier of Qwen3.5-9B.
Use case: Local/throughput — speed sweet spot (1.25x prefill)
Benchmark Results (lm_eval)
| Model | MMLU | HellaSwag | ARC-C | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|
| Qwen3.5-9B (baseline) | 78.7 | 78.1 | 55.6 | 53.7 | 73.0 | 87.3 |
| Sculpt Default (kf=0.95) | 76.2 (↓2.5) | 75.8 (↓2.3) | 56.4 (↑0.8) | 52.6 (↓1.1) | 68.7 (↓4.3) | 81.5 (↓5.8) |
| Sculpt Production (kf=0.9) | 73.9 (↓4.8) | 75.1 (↓3.0) | 56.8 (↑1.2) | 47.3 (↓6.4) | 69.8 (↓3.2) | 74.5 (↓12.8) |
| Sculpt Throughput (kf=0.88) | 70.8 (↓7.9) | 74.0 (↓4.1) | 57.2 (↑1.6) | 52.0 (↓1.7) | 70.7 (↓2.3) | 69.6 (↓17.7) |
| Sculpt Experimental (kf=0.82) | 70.2 (↓8.5) | 70.7 (↓7.4) | 53.6 (↓2.0) | 47.6 (↓6.1) | 66.6 (↓6.4) | 54.7 (↓32.6) |
This Model vs Baseline
| Benchmark | Throughput | Baseline | Delta |
|---|---|---|---|
| arc_challenge | 57.2 | 55.6 | +1.6 |
| gsm8k | 69.6 | 87.3 | -17.7 |
| hellaswag | 74.0 | 78.1 | -4.1 |
| mmlu | 70.8 | 78.7 | -7.9 |
| mmlu_abstract_algebra | 47.0 | 66.0 | -19.0 |
| mmlu_anatomy | 68.9 | 77.8 | -8.9 |
| mmlu_astronomy | 82.9 | 92.8 | -9.9 |
| mmlu_business_ethics | 73.0 | 82.0 | -9.0 |
| mmlu_clinical_knowledge | 77.7 | 86.8 | -9.1 |
| mmlu_college_biology | 83.3 | 93.1 | -9.8 |
| mmlu_college_chemistry | 52.0 | 59.0 | -7.0 |
| mmlu_college_computer_science | 64.0 | 82.0 | -18.0 |
| mmlu_college_mathematics | 51.0 | 64.0 | -13.0 |
| mmlu_college_medicine | 68.2 | 81.5 | -13.3 |
| mmlu_college_physics | 59.8 | 64.7 | -4.9 |
| mmlu_computer_security | 76.0 | 83.0 | -7.0 |
| mmlu_conceptual_physics | 77.9 | 90.2 | -12.3 |
| mmlu_econometrics | 52.6 | 73.7 | -21.1 |
| mmlu_electrical_engineering | 64.8 | 82.1 | -17.3 |
| mmlu_elementary_mathematics | 61.9 | 80.7 | -18.8 |
| mmlu_formal_logic | 64.3 | 65.9 | -1.6 |
| mmlu_global_facts | 37.0 | 50.0 | -13.0 |
| mmlu_high_school_biology | 87.1 | 93.5 | -6.4 |
| mmlu_high_school_chemistry | 67.0 | 77.8 | -10.8 |
| mmlu_high_school_computer_science | 75.0 | 88.0 | -13.0 |
| mmlu_high_school_european_history | 84.8 | 87.3 | -2.5 |
| mmlu_high_school_geography | 82.3 | 92.4 | -10.1 |
| mmlu_high_school_government_and_politics | 89.1 | 96.9 | -7.8 |
| mmlu_high_school_macroeconomics | 75.4 | 85.9 | -10.5 |
| mmlu_high_school_mathematics | 44.1 | 53.3 | -9.2 |
| mmlu_high_school_microeconomics | 85.7 | 93.3 | -7.6 |
| mmlu_high_school_physics | 63.6 | 72.8 | -9.2 |
| mmlu_high_school_psychology | 89.7 | 93.2 | -3.5 |
| mmlu_high_school_statistics | 69.4 | 78.7 | -9.3 |
| mmlu_high_school_us_history | 81.9 | 90.2 | -8.3 |
| mmlu_high_school_world_history | 84.0 | 89.9 | -5.9 |
| mmlu_human_aging | 71.3 | 78.9 | -7.6 |
| mmlu_human_sexuality | 78.6 | 86.3 | -7.7 |
| mmlu_humanities | 65.5 | 70.5 | -5.0 |
| mmlu_international_law | 81.0 | 90.1 | -9.1 |
| mmlu_jurisprudence | 80.6 | 84.3 | -3.7 |
| mmlu_logical_fallacies | 74.2 | 84.7 | -10.5 |
| mmlu_machine_learning | 55.4 | 66.1 | -10.7 |
| mmlu_management | 86.4 | 86.4 | +0.0 |
| mmlu_marketing | 86.8 | 95.7 | -8.9 |
| mmlu_medical_genetics | 82.0 | 91.0 | -9.0 |
| mmlu_miscellaneous | 82.6 | 90.3 | -7.7 |
| mmlu_moral_disputes | 71.1 | 81.2 | -10.1 |
| mmlu_moral_scenarios | 57.4 | 53.3 | +4.1 |
| mmlu_nutrition | 76.1 | 86.3 | -10.2 |
| mmlu_other | 74.2 | 83.1 | -8.9 |
| mmlu_philosophy | 73.3 | 80.4 | -7.1 |
| mmlu_prehistory | 72.5 | 84.3 | -11.8 |
| mmlu_professional_accounting | 54.6 | 65.6 | -11.0 |
| mmlu_professional_law | 53.9 | 60.3 | -6.4 |
| mmlu_professional_medicine | 79.8 | 91.5 | -11.7 |
| mmlu_professional_psychology | 72.1 | 82.8 | -10.7 |
| mmlu_public_relations | 67.3 | 73.6 | -6.3 |
| mmlu_security_studies | 75.1 | 76.7 | -1.6 |
| mmlu_social_sciences | 79.5 | 87.0 | -7.5 |
| mmlu_sociology | 87.6 | 89.1 | -1.5 |
| mmlu_stem | 66.9 | 78.3 | -11.4 |
| mmlu_us_foreign_policy | 86.0 | 90.0 | -4.0 |
| mmlu_virology | 53.0 | 56.6 | -3.6 |
| mmlu_world_religions | 81.9 | 86.5 | -4.6 |
| truthfulqa_mc2 | 52.0 | 53.7 | -1.7 |
| winogrande | 70.7 | 73.0 | -2.3 |
Performance
| Metric | Sculpt | Baseline | Change |
|---|---|---|---|
| Model size | 15.6 GB | 16.7 GB | -6.2% |
| Parameters | 8,400,155,136 | — | — |
| Prefill throughput | 5,726 tok/s | 4,566 tok/s | +25% |
| Decode throughput | 36 tok/s | 37 tok/s | -4% |
KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/Qwen3.5-9B-Sculpt-Throughput",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Throughput")
inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
All Sculpt Tiers
| Tier | HuggingFace | Config | Use Case |
|---|---|---|---|
| Default | dystrio/Qwen3.5-9B-Sculpt-Default | kf=0.95 | Enterprise — maximum quality preservation |
| Production | dystrio/Qwen3.5-9B-Sculpt-Production | kf=0.9 | Enterprise — balanced quality and efficiency |
| Throughput | dystrio/Qwen3.5-9B-Sculpt-Throughput | kf=0.88 | Local/throughput — speed sweet spot (1.25x prefill) |
| Experimental | dystrio/Qwen3.5-9B-Sculpt-Experimental | kf=0.82 | Local — maximum compression (1.27x prefill) |
Technical Details
- Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
- Keep fraction: 0.88 (12% of FFN neurons removed)
- Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
- Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
- Hardware: 1x NVIDIA H200 141GB
- Output: Standard dense transformer — loads with any HuggingFace-compatible framework
Compatibility
- HuggingFace Transformers
- vLLM
- TGI (Text Generation Inference)
- llama.cpp / GGUF conversion
- AWQ / GPTQ quantization
- Any framework that loads standard safetensors
Citation
@misc{dystrio_sculpt_2026,
title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
author={Dystrio},
year={2026},
url={https://huggingface.co/dystrio}
}
- Downloads last month
- 308