AI Image Detector (SigLIP2 + DINOv2 Ensemble)
A high-accuracy, quality-agnostic model for detecting AI-generated images, achieving 0.9997 AUC on validation and strong cross-dataset generalization.
Key Features
- Quality-agnostic: Performs consistently on both pristine and degraded images (JPEG compression, blur, noise)
- Dual-encoder architecture: Combines SigLIP2's semantic understanding with DINOv2's self-supervised features
- Efficient fine-tuning: Uses LoRA adapters (~8M trainable params out of ~740M total)
- Production-ready: Tested on 10+ external datasets
Performance
Validation Results (OpenFake, 5K images)
| Metric | Clean Images | Degraded Images | Average |
|---|---|---|---|
| AUC | 0.9998 | 0.9995 | 0.9997 |
| Accuracy | 99.24% | 98.96% | 99.10% |
Quality-agnostic verification: AUC gap between clean and degraded images is only 0.0003, confirming robust performance across image quality levels.
Cross-Dataset Generalization
Real Image Datasets (Target: Classify as Real)
| Dataset | Samples | Accuracy | Mean P(AI) |
|---|---|---|---|
| Food-101 | 300 | 100.00% | 0.032 |
| COCO 2017 | 300 | 90.67% | 0.135 |
| Cats vs Dogs | 300 | 99.67% | 0.036 |
| Stanford Cars | 300 | 94.67% | 0.110 |
| Oxford Flowers | 300 | 95.67% | 0.115 |
| Average | β | 96.13% | β |
AI-Generated Image Datasets (Target: Classify as AI)
| Dataset | Generator | Samples | Accuracy | Mean P(AI) |
|---|---|---|---|---|
| DALL-E 3 | OpenAI | 300 | 100.00% | 0.993 |
| Midjourney V6 | Midjourney | 300 | 96.33% | 0.936 |
| Average | β | β | 98.17% | β |
Mixed Benchmark Datasets
| Dataset | Samples | Accuracy | AUC | F1 |
|---|---|---|---|---|
| AI-or-Not | 500 | 96.80% | 0.9986 | 97.04% |
Overall cross-dataset accuracy: 97.15%
Supported AI Generators
Trained on OpenFake dataset which includes images from 25+ generators:
- Diffusion models: Stable Diffusion (1.5, 2.1, XL, 3.5), Flux (1.0, 1.1 Pro), DALL-E 3, Midjourney (v5, v6), Imagen, Kandinsky
- GANs: StyleGAN, ProGAN, BigGAN
- Other: GPT-Image-1, Firefly, Ideogram, and more
Usage
Installation
pip install torch torchvision transformers timm peft pillow
Quick Start
from huggingface_hub import hf_hub_download
from model import AIImageDetector
# Download model
model_path = hf_hub_download(
repo_id="Bombek1/ai-image-detector-siglip-dinov2",
filename="pytorch_model.pt"
)
# Initialize detector
detector = AIImageDetector(model_path)
# Predict single image
result = detector.predict("path/to/image.jpg")
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"P(AI): {result['probability']:.4f}")
Batch Processing
from pathlib import Path
images = list(Path("./images").glob("*.jpg"))
for img_path in images:
result = detector.predict(img_path)
print(f"{img_path.name}: {result['prediction']} ({result['confidence']:.1%})")
Model Architecture
EnsembleAIDetector (~740M parameters, ~8M trainable)
βββ SigLIP2-SO400M-patch14-384 (with LoRA r=32 on q_proj, v_proj)
β βββ Output: 1152-dim features
βββ DINOv2-Large-patch14 (with LoRA r=32 on qkv)
β βββ Output: 1024-dim features
βββ ClassificationHead
βββ LayerNorm(2176)
βββ Linear(2176 β 512) + GELU + Dropout(0.3)
βββ Linear(512 β 256) + GELU + Dropout(0.3)
βββ Linear(256 β 1) β Sigmoid
Training Details
| Parameter | Value |
|---|---|
| Dataset | OpenFake (~95K train, 5K val) |
| Image Size | 392Γ392 |
| Epochs | 5 |
| Batch Size | 16 (effective: 144 with grad accum) |
| Learning Rate | 2e-4 (head), 5e-5 (LoRA) |
| Scheduler | Cosine with warmup |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Loss | Focal Loss (Ξ³=2, Ξ±=0.25) |
Quality-Agnostic Augmentations
The model is trained with aggressive image degradation to ensure robustness:
- JPEG compression (quality 30-95)
- Gaussian blur (Ο up to 2.0)
- Gaussian noise (Ο up to 0.05)
- Resize artifacts (down to 50% then back up)
- Color jitter, random crops, flips
Limitations
| Limitation | Details |
|---|---|
| Low-resolution images | Performance degrades on images <128Γ128 (e.g., CIFAKE 32Γ32 dataset shows ~50% accuracy) |
| COCO-style images | ~9% false positive rate on casual/cluttered real photos |
| Artistic macro photography | Professional studio/macro shots may occasionally trigger false positives (~5%) |
| Non-photographic content | Designed for photographs; screenshots, graphics, and illustrations may not work well |
Files
pytorch_model.ptβ Full checkpoint with LoRA weightsmodel.pyβ Inference code withAIImageDetectorclassconfig.jsonβ Model configuration
Citation
@misc{ai-image-detector-2025,
author = {Bombek1},
title = {AI Image Detector (SigLIP2 + DINOv2 Ensemble)},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2}
}
License
MIT License
- Downloads last month
- 189