AI Image Detector (SigLIP2 + DINOv2 Ensemble)

A high-accuracy, quality-agnostic model for detecting AI-generated images, achieving 0.9997 AUC on validation and strong cross-dataset generalization.

Key Features

Quality-agnostic: Performs consistently on both pristine and degraded images (JPEG compression, blur, noise)
Dual-encoder architecture: Combines SigLIP2's semantic understanding with DINOv2's self-supervised features
Efficient fine-tuning: Uses LoRA adapters (~8M trainable params out of ~740M total)
Production-ready: Tested on 10+ external datasets

Performance

Validation Results (OpenFake, 5K images)

Metric	Clean Images	Degraded Images	Average
AUC	0.9998	0.9995	0.9997
Accuracy	99.24%	98.96%	99.10%

Quality-agnostic verification: AUC gap between clean and degraded images is only 0.0003, confirming robust performance across image quality levels.

Cross-Dataset Generalization

Real Image Datasets (Target: Classify as Real)

Dataset	Samples	Accuracy	Mean P(AI)
Food-101	300	100.00%	0.032
COCO 2017	300	90.67%	0.135
Cats vs Dogs	300	99.67%	0.036
Stanford Cars	300	94.67%	0.110
Oxford Flowers	300	95.67%	0.115
Average	—	96.13%	—

AI-Generated Image Datasets (Target: Classify as AI)

Dataset	Generator	Samples	Accuracy	Mean P(AI)
DALL-E 3	OpenAI	300	100.00%	0.993
Midjourney V6	Midjourney	300	96.33%	0.936
Average	—	—	98.17%	—

Mixed Benchmark Datasets

Dataset	Samples	Accuracy	AUC	F1
AI-or-Not	500	96.80%	0.9986	97.04%

Overall cross-dataset accuracy: 97.15%

Supported AI Generators

Trained on OpenFake dataset which includes images from 25+ generators:

Diffusion models: Stable Diffusion (1.5, 2.1, XL, 3.5), Flux (1.0, 1.1 Pro), DALL-E 3, Midjourney (v5, v6), Imagen, Kandinsky
GANs: StyleGAN, ProGAN, BigGAN
Other: GPT-Image-1, Firefly, Ideogram, and more

Usage

Installation

pip install torch torchvision transformers timm peft pillow

Quick Start

from huggingface_hub import hf_hub_download
from model import AIImageDetector

# Download model
model_path = hf_hub_download(
    repo_id="Bombek1/ai-image-detector-siglip-dinov2",
    filename="pytorch_model.pt"
)

# Initialize detector
detector = AIImageDetector(model_path)

# Predict single image
result = detector.predict("path/to/image.jpg")
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"P(AI): {result['probability']:.4f}")

Batch Processing

from pathlib import Path

images = list(Path("./images").glob("*.jpg"))
for img_path in images:
    result = detector.predict(img_path)
    print(f"{img_path.name}: {result['prediction']} ({result['confidence']:.1%})")

Model Architecture

EnsembleAIDetector (~740M parameters, ~8M trainable)
├── SigLIP2-SO400M-patch14-384 (with LoRA r=32 on q_proj, v_proj)
│   └── Output: 1152-dim features
├── DINOv2-Large-patch14 (with LoRA r=32 on qkv)
│   └── Output: 1024-dim features
└── ClassificationHead
    ├── LayerNorm(2176)
    ├── Linear(2176 → 512) + GELU + Dropout(0.3)
    ├── Linear(512 → 256) + GELU + Dropout(0.3)
    └── Linear(256 → 1) → Sigmoid

Training Details

Parameter	Value
Dataset	OpenFake (~95K train, 5K val)
Image Size	392×392
Epochs	5
Batch Size	16 (effective: 144 with grad accum)
Learning Rate	2e-4 (head), 5e-5 (LoRA)
Scheduler	Cosine with warmup
LoRA Rank	32
LoRA Alpha	64
Loss	Focal Loss (γ=2, α=0.25)

Quality-Agnostic Augmentations

The model is trained with aggressive image degradation to ensure robustness:

JPEG compression (quality 30-95)
Gaussian blur (σ up to 2.0)
Gaussian noise (σ up to 0.05)
Resize artifacts (down to 50% then back up)
Color jitter, random crops, flips

Limitations

Limitation	Details
Low-resolution images	Performance degrades on images <128×128 (e.g., CIFAKE 32×32 dataset shows ~50% accuracy)
COCO-style images	~9% false positive rate on casual/cluttered real photos
Artistic macro photography	Professional studio/macro shots may occasionally trigger false positives (~5%)
Non-photographic content	Designed for photographs; screenshots, graphics, and illustrations may not work well

Files

pytorch_model.pt — Full checkpoint with LoRA weights
model.py — Inference code with AIImageDetector class
config.json — Model configuration

Citation

@misc{ai-image-detector-2025,
  author = {Bombek1},
  title = {AI Image Detector (SigLIP2 + DINOv2 Ensemble)},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2}
}

License

MIT License

Downloads last month: 189

Bombek1
/

ai-image-detector-siglip-dinov2