Qwen3-VL-32B-Instruct-v2-MAX-MXFP4

Qwen3-VL-32B-Instruct-v2-MAX-MXFP4 is an optimized and compressed evolution built on top of Qwen/Qwen3-VL-32B-Instruct, designed for advanced multimodal understanding and high-detail image captioning. This variant leverages BF16 · U8 tensor formats to significantly reduce memory footprint and improve inference efficiency while maintaining strong output quality. The model incorporates a more optimized abliteration rate, combining refined refusal direction analysis with enhanced training strategies to minimize internal refusal behaviors while preserving strong reasoning, instruction-following, and visual understanding capabilities. The result is a powerful 32B parameter vision-language model optimized for highly detailed captions, deep scene understanding, and rich multimodal reasoning, now with efficient deployment characteristics.

This model is intended for research and learning purposes only. Due to reduced internal refusal mechanisms, it may generate sensitive or unrestricted content. Users assume full responsibility for how the model is used. The authors and hosting platform disclaim any liability for generated outputs.

Key Highlights

MXFP4 Compression (BF16 · U8) Reduces VRAM usage and improves inference efficiency while maintaining strong multimodal performance.
Optimized Abliteration Rate (v2) Enhanced suppression of refusal behaviors with improved balance between openness, coherence, and stability.
Advanced Refusal Direction Analysis Uses activation-level techniques to identify and mitigate refusal-related patterns within the model.
Multimodal Instruction Tuning Built on Qwen3-VL-32B-Instruct, enabling strong performance across vision-language tasks.
High-Fidelity Caption Generation Produces long-form, structured, and semantically rich captions with deep scene understanding.
32B Parameter Architecture Delivers powerful reasoning, visual grounding, and contextual awareness for complex multimodal tasks.
Efficient High-Capability Deployment Designed for large-scale inference with reduced hardware requirements compared to full-precision models.

Quick Start with Transformers

pip install transformers==5.4.0
# or
pip install git+https://github.com/huggingface/transformers.git

from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
import torch

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen3-VL-32B-Instruct-v2-MAX-MXFP4",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "prithivMLmods/Qwen3-VL-32B-Instruct-v2-MAX-MXFP4"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in extreme detail."}
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(
    text=[text],
    padding=True,
    return_tensors="pt"
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

High-Detail Image Captioning – Generating rich, descriptive captions for images.
Multimodal Research – Studying vision-language reasoning and instruction-following behavior.
Dataset Generation – Creating large-scale caption datasets for training.
Annotation Automation – Assisting in labeling and structured visual description tasks.
Efficient Large-Scale Deployment – Running 32B multimodal models with reduced memory requirements.

Limitations & Risks

Important Note: This model intentionally minimizes built-in safety refusals.

Unfiltered Outputs – May generate explicit or sensitive captions depending on inputs.
User Responsibility – Must be used in a safe, ethical, and lawful manner.
Compression Trade-offs – MXFP4 (BF16 · U8) may introduce minor precision or consistency variations.
Compute Considerations – While optimized, 32B models still benefit from high-performance GPU setups.
Abliteration Trade-offs – Increased openness may affect safety alignment or response consistency.

Downloads last month: -

Safetensors

Model size

19B params

Tensor type

BF16

Model tree for prithivMLmods/Qwen3-VL-32B-Instruct-v2-MAX-MXFP4

Base model

Qwen/Qwen3-VL-32B-Instruct

Quantized

(36)

this model

Collection including prithivMLmods/Qwen3-VL-32B-Instruct-v2-MAX-MXFP4

v2-MAX [BF16 / NVFP4A16 / MXFP4A16/ FP8 Dynamic]

Collection

Version 2 in the Unredacted series, with much lower refusal limits, is available in different compression schemes. • 7 items • Updated about 14 hours ago • 2