Qwen3.5-9B-Sculpt-Throughput

12% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.

Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.

This is the Throughput tier of Qwen3.5-9B.

Use case: Local/throughput — speed sweet spot (1.25x prefill)

Benchmark Results (lm_eval)

Model	MMLU	HellaSwag	ARC-C	TruthfulQA	Winogrande	GSM8K
Qwen3.5-9B (baseline)	78.7	78.1	55.6	53.7	73.0	87.3
Sculpt Default (kf=0.95)	76.2 (↓2.5)	75.8 (↓2.3)	56.4 (↑0.8)	52.6 (↓1.1)	68.7 (↓4.3)	81.5 (↓5.8)
Sculpt Production (kf=0.9)	73.9 (↓4.8)	75.1 (↓3.0)	56.8 (↑1.2)	47.3 (↓6.4)	69.8 (↓3.2)	74.5 (↓12.8)
Sculpt Throughput (kf=0.88)	70.8 (↓7.9)	74.0 (↓4.1)	57.2 (↑1.6)	52.0 (↓1.7)	70.7 (↓2.3)	69.6 (↓17.7)
Sculpt Experimental (kf=0.82)	70.2 (↓8.5)	70.7 (↓7.4)	53.6 (↓2.0)	47.6 (↓6.1)	66.6 (↓6.4)	54.7 (↓32.6)

This Model vs Baseline

Benchmark	Throughput	Baseline	Delta
arc_challenge	57.2	55.6	+1.6
gsm8k	69.6	87.3	-17.7
hellaswag	74.0	78.1	-4.1
mmlu	70.8	78.7	-7.9
mmlu_abstract_algebra	47.0	66.0	-19.0
mmlu_anatomy	68.9	77.8	-8.9
mmlu_astronomy	82.9	92.8	-9.9
mmlu_business_ethics	73.0	82.0	-9.0
mmlu_clinical_knowledge	77.7	86.8	-9.1
mmlu_college_biology	83.3	93.1	-9.8
mmlu_college_chemistry	52.0	59.0	-7.0
mmlu_college_computer_science	64.0	82.0	-18.0
mmlu_college_mathematics	51.0	64.0	-13.0
mmlu_college_medicine	68.2	81.5	-13.3
mmlu_college_physics	59.8	64.7	-4.9
mmlu_computer_security	76.0	83.0	-7.0
mmlu_conceptual_physics	77.9	90.2	-12.3
mmlu_econometrics	52.6	73.7	-21.1
mmlu_electrical_engineering	64.8	82.1	-17.3
mmlu_elementary_mathematics	61.9	80.7	-18.8
mmlu_formal_logic	64.3	65.9	-1.6
mmlu_global_facts	37.0	50.0	-13.0
mmlu_high_school_biology	87.1	93.5	-6.4
mmlu_high_school_chemistry	67.0	77.8	-10.8
mmlu_high_school_computer_science	75.0	88.0	-13.0
mmlu_high_school_european_history	84.8	87.3	-2.5
mmlu_high_school_geography	82.3	92.4	-10.1
mmlu_high_school_government_and_politics	89.1	96.9	-7.8
mmlu_high_school_macroeconomics	75.4	85.9	-10.5
mmlu_high_school_mathematics	44.1	53.3	-9.2
mmlu_high_school_microeconomics	85.7	93.3	-7.6
mmlu_high_school_physics	63.6	72.8	-9.2
mmlu_high_school_psychology	89.7	93.2	-3.5
mmlu_high_school_statistics	69.4	78.7	-9.3
mmlu_high_school_us_history	81.9	90.2	-8.3
mmlu_high_school_world_history	84.0	89.9	-5.9
mmlu_human_aging	71.3	78.9	-7.6
mmlu_human_sexuality	78.6	86.3	-7.7
mmlu_humanities	65.5	70.5	-5.0
mmlu_international_law	81.0	90.1	-9.1
mmlu_jurisprudence	80.6	84.3	-3.7
mmlu_logical_fallacies	74.2	84.7	-10.5
mmlu_machine_learning	55.4	66.1	-10.7
mmlu_management	86.4	86.4	+0.0
mmlu_marketing	86.8	95.7	-8.9
mmlu_medical_genetics	82.0	91.0	-9.0
mmlu_miscellaneous	82.6	90.3	-7.7
mmlu_moral_disputes	71.1	81.2	-10.1
mmlu_moral_scenarios	57.4	53.3	+4.1
mmlu_nutrition	76.1	86.3	-10.2
mmlu_other	74.2	83.1	-8.9
mmlu_philosophy	73.3	80.4	-7.1
mmlu_prehistory	72.5	84.3	-11.8
mmlu_professional_accounting	54.6	65.6	-11.0
mmlu_professional_law	53.9	60.3	-6.4
mmlu_professional_medicine	79.8	91.5	-11.7
mmlu_professional_psychology	72.1	82.8	-10.7
mmlu_public_relations	67.3	73.6	-6.3
mmlu_security_studies	75.1	76.7	-1.6
mmlu_social_sciences	79.5	87.0	-7.5
mmlu_sociology	87.6	89.1	-1.5
mmlu_stem	66.9	78.3	-11.4
mmlu_us_foreign_policy	86.0	90.0	-4.0
mmlu_virology	53.0	56.6	-3.6
mmlu_world_religions	81.9	86.5	-4.6
truthfulqa_mc2	52.0	53.7	-1.7
winogrande	70.7	73.0	-2.3

Performance

Metric	Sculpt	Baseline	Change
Model size	15.6 GB	16.7 GB	-6.2%
Parameters	8,400,155,136	—	—
Prefill throughput	5,726 tok/s	4,566 tok/s	+25%
Decode throughput	36 tok/s	37 tok/s	-4%

KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/Qwen3.5-9B-Sculpt-Throughput",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Throughput")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Sculpt Tiers

Tier	HuggingFace	Config	Use Case
Default	dystrio/Qwen3.5-9B-Sculpt-Default	kf=0.95	Enterprise — maximum quality preservation
Production	dystrio/Qwen3.5-9B-Sculpt-Production	kf=0.9	Enterprise — balanced quality and efficiency
Throughput	dystrio/Qwen3.5-9B-Sculpt-Throughput	kf=0.88	Local/throughput — speed sweet spot (1.25x prefill)
Experimental	dystrio/Qwen3.5-9B-Sculpt-Experimental	kf=0.82	Local — maximum compression (1.27x prefill)

Technical Details

Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
Keep fraction: 0.88 (12% of FFN neurons removed)
Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
Hardware: 1x NVIDIA H200 141GB
Output: Standard dense transformer — loads with any HuggingFace-compatible framework

Compatibility

HuggingFace Transformers
vLLM
TGI (Text Generation Inference)
llama.cpp / GGUF conversion
AWQ / GPTQ quantization
Any framework that loads standard safetensors

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}

Downloads last month: 308

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for dystrio/Qwen3.5-9B-Sculpt-Throughput

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B