Qwen3.5-27B Claude 4.6 Opus Reasoning Distilled v2 โ GGUF
Quantized by SolidRusT Networks
IQ4_XS quantization of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 using mradermacher's imatrix calibration data.
What This Is
A 27B parameter model reasoning-distilled from Claude 4.6 Opus, quantized to IQ4_XS with importance matrix for optimal quality/size tradeoff. The v2 training improves tool calling accuracy by 31.6% over v1 on quantized models.
Files
| File | Size | Description |
|---|---|---|
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.IQ4_XS.gguf |
14.7GB | IQ4_XS imatrix quantization |
Performance
Tested on dual AMD Radeon RX 7900 XTX (2ร 24GB VRAM):
- ~30 tok/sec generation
- 131K context window
- Tool calling confirmed working
Usage
llama.cpp
```bash
llama-server
-m Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2.IQ4_XS.gguf
--host 0.0.0.0 --port 8080
-c 131072 -ngl 99
--think
```
vLLM
Not recommended โ use the FP8 variant for vLLM.
Quantization Details
- Source: v2 BF16 weights
- Method: IQ4_XS with importance matrix (imatrix)
- Imatrix: mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF
- Tool: llama.cpp
Credits
- Original model: Jackrong
- Imatrix calibration: mradermacher / nicoboss
- Quantization: SolidRusT Networks
- Downloads last month
- 1,574
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for solidrust/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF
Base model
Qwen/Qwen3.5-27B