AI Image Detector (SigLIP2 + DINOv2 Ensemble)

A high-accuracy, quality-agnostic model for detecting AI-generated images, achieving 0.9997 AUC on validation and strong cross-dataset generalization.

Key Features

  • Quality-agnostic: Performs consistently on both pristine and degraded images (JPEG compression, blur, noise)
  • Dual-encoder architecture: Combines SigLIP2's semantic understanding with DINOv2's self-supervised features
  • Efficient fine-tuning: Uses LoRA adapters (~8M trainable params out of ~740M total)
  • Production-ready: Tested on 10+ external datasets

Performance

Validation Results (OpenFake, 5K images)

Metric Clean Images Degraded Images Average
AUC 0.9998 0.9995 0.9997
Accuracy 99.24% 98.96% 99.10%

Quality-agnostic verification: AUC gap between clean and degraded images is only 0.0003, confirming robust performance across image quality levels.

Cross-Dataset Generalization

Real Image Datasets (Target: Classify as Real)

Dataset Samples Accuracy Mean P(AI)
Food-101 300 100.00% 0.032
COCO 2017 300 90.67% 0.135
Cats vs Dogs 300 99.67% 0.036
Stanford Cars 300 94.67% 0.110
Oxford Flowers 300 95.67% 0.115
Average β€” 96.13% β€”

AI-Generated Image Datasets (Target: Classify as AI)

Dataset Generator Samples Accuracy Mean P(AI)
DALL-E 3 OpenAI 300 100.00% 0.993
Midjourney V6 Midjourney 300 96.33% 0.936
Average β€” β€” 98.17% β€”

Mixed Benchmark Datasets

Dataset Samples Accuracy AUC F1
AI-or-Not 500 96.80% 0.9986 97.04%

Overall cross-dataset accuracy: 97.15%

Supported AI Generators

Trained on OpenFake dataset which includes images from 25+ generators:

  • Diffusion models: Stable Diffusion (1.5, 2.1, XL, 3.5), Flux (1.0, 1.1 Pro), DALL-E 3, Midjourney (v5, v6), Imagen, Kandinsky
  • GANs: StyleGAN, ProGAN, BigGAN
  • Other: GPT-Image-1, Firefly, Ideogram, and more

Usage

Installation

pip install torch torchvision transformers timm peft pillow

Quick Start

from huggingface_hub import hf_hub_download
from model import AIImageDetector

# Download model
model_path = hf_hub_download(
    repo_id="Bombek1/ai-image-detector-siglip-dinov2",
    filename="pytorch_model.pt"
)

# Initialize detector
detector = AIImageDetector(model_path)

# Predict single image
result = detector.predict("path/to/image.jpg")
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"P(AI): {result['probability']:.4f}")

Batch Processing

from pathlib import Path

images = list(Path("./images").glob("*.jpg"))
for img_path in images:
    result = detector.predict(img_path)
    print(f"{img_path.name}: {result['prediction']} ({result['confidence']:.1%})")

Model Architecture

EnsembleAIDetector (~740M parameters, ~8M trainable)
β”œβ”€β”€ SigLIP2-SO400M-patch14-384 (with LoRA r=32 on q_proj, v_proj)
β”‚   └── Output: 1152-dim features
β”œβ”€β”€ DINOv2-Large-patch14 (with LoRA r=32 on qkv)
β”‚   └── Output: 1024-dim features
└── ClassificationHead
    β”œβ”€β”€ LayerNorm(2176)
    β”œβ”€β”€ Linear(2176 β†’ 512) + GELU + Dropout(0.3)
    β”œβ”€β”€ Linear(512 β†’ 256) + GELU + Dropout(0.3)
    └── Linear(256 β†’ 1) β†’ Sigmoid

Training Details

Parameter Value
Dataset OpenFake (~95K train, 5K val)
Image Size 392Γ—392
Epochs 5
Batch Size 16 (effective: 144 with grad accum)
Learning Rate 2e-4 (head), 5e-5 (LoRA)
Scheduler Cosine with warmup
LoRA Rank 32
LoRA Alpha 64
Loss Focal Loss (Ξ³=2, Ξ±=0.25)

Quality-Agnostic Augmentations

The model is trained with aggressive image degradation to ensure robustness:

  • JPEG compression (quality 30-95)
  • Gaussian blur (Οƒ up to 2.0)
  • Gaussian noise (Οƒ up to 0.05)
  • Resize artifacts (down to 50% then back up)
  • Color jitter, random crops, flips

Limitations

Limitation Details
Low-resolution images Performance degrades on images <128Γ—128 (e.g., CIFAKE 32Γ—32 dataset shows ~50% accuracy)
COCO-style images ~9% false positive rate on casual/cluttered real photos
Artistic macro photography Professional studio/macro shots may occasionally trigger false positives (~5%)
Non-photographic content Designed for photographs; screenshots, graphics, and illustrations may not work well

Files

  • pytorch_model.pt β€” Full checkpoint with LoRA weights
  • model.py β€” Inference code with AIImageDetector class
  • config.json β€” Model configuration

Citation

@misc{ai-image-detector-2025,
  author = {Bombek1},
  title = {AI Image Detector (SigLIP2 + DINOv2 Ensemble)},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2}
}

License

MIT License

Downloads last month
189
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using Bombek1/ai-image-detector-siglip-dinov2 2