gpt-oss-puzzle-88B-GGUF

EXPERIMENTAL - REQUIRES CUSTOM BRANCH

These GGUF files will NOT work with mainline llama.cpp. You must use the branch linked below.

GGUF quantisation of nvidia/gpt-oss-puzzle-88B, an 88B parameter MoE model derived from gpt-oss-120B using NVIDIA's Puzzle NAS framework.

Required Branch

This model requires a custom llama.cpp branch with gpt-oss-puzzle architecture support:

https://github.com/smpurkis/llama.cpp/tree/gpt-oss-puzzle-support

Tracking issue: ggml-org/llama.cpp#21028 PR: ggml-org/llama.cpp#21032

This will not work on mainline llama.cpp until the architecture is merged upstream.

How to Use

# Clone the required branch
git clone --branch gpt-oss-puzzle-support https://github.com/smpurkis/llama.cpp.git
cd llama.cpp

# Build (example with Vulkan)
cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release -j$(nproc)

# Run
./build/bin/llama-cli -m gpt-oss-puzzle-88B.MXFP4_MOE.gguf -ngl 99 -fa 1 -p "Hello"

Available Quantisations

File	Quant	Size	Description
`gpt-oss-puzzle-88B.f16.gguf`	F16	47.0 GiB	Full precision (for requantisation)
`gpt-oss-puzzle-88B.MXFP4_MOE.gguf`	MXFP4_MOE	44.8 GiB	Native MXFP4 expert weights (matches original model precision)

Architecture Differences from gpt-oss-120B

The puzzle model differs from the standard gpt-oss architecture in ways that require dedicated support:

Property	gpt-oss-120B	gpt-oss-puzzle-88B
Expert count	128 per layer (uniform)	128 or 64 per layer (heterogeneous)
Attention pattern	Interleaved global/SWA (single window)	Global + multiple SWA window sizes (128, 8192)
Total parameters	~117B	~88B

Credits

Original model: NVIDIA
llama.cpp architecture support: smpurkis/llama.cpp@gpt-oss-puzzle-support

Downloads last month: 2,181

GGUF

Model size

88B params

Architecture

gpt-oss-puzzle

Hardware compatibility

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamPurkis/gpt-oss-puzzle-88B-GGUF

Base model

nvidia/gpt-oss-puzzle-88B

Quantized

(1)

this model

Paper for SamPurkis/gpt-oss-puzzle-88B-GGUF

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Paper • 2411.19146 • Published Nov 28, 2024 • 20