YOLOX-S — Core AI

YOLOX (Megvii, Apache-2.0) converted to Apple Core AI (.aimodel) — a single-stage anchor-free object detector running as one static graph on every Apple compute unit (Mac GPU / iPhone GPU / Neural Engine). Part of the Core AI model zoo (model card).

The dense-detector counterpart to RF-DETR-CoreAI: where the DETR family needs no NMS, YOLOX is the classic score = obj · cls + per-class NMS pipeline.

Use it

▶️ Run it (source) — the DetectCamera runner (real-time object detection on the zero-copy camera path):

git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/DetectCamera/DetectCamera.xcodeproj
# → Run, then pick "YOLOX" in the model picker

# agents / headless (macOS):
cd coreai-kit/Examples/DetectCamera
swift run detect-cli --model yolox-s --image Resources/gate_image.jpg

💻 Build with it — complete; the glue is kit API, copy-paste runs:

import CoreAIKitVision

let detector = try await KitDetector(catalog: "yolox-s")
let image = try ImageFile.load(imageURL)  // any image file → CGImage + EXIF orientation
let detections = try await detector.detect(in: image.cgImage)
// detections: [Detection] — label, score, normalized box (top-left origin)

The take-home is Examples/DetectCamera/Sources/QuickStart.swift — this exact code as one typed function, no UI; the CLI is an argument shell over it, and the GUI runs the same detector per camera frame on a zero-copy pixel-buffer fast path. YOLOX is a dense detector — KitDetector runs the obj·cls threshold + per-class NMS host-side; the DETR family needs none. Same detect(in:) either way.

Integration checklist

SPM: https://github.com/john-rocky/coreai-kit → product CoreAIKitVision
Info.plist: NSCameraUsageDescription — only for the live camera; the snippet needs none
Entitlements: none needed
First run downloads the model — 0.0 GB (Mac) / 0.0 GB (iPhone) — then it loads from the local cache (Application Support; progress via the downloadProgress callback)
Measure in Release — Debug is ~3× slower on per-token host work

Bundle

yolox-s_float32.aimodel — YOLOX-S, 640² input, 8.97M params, fp32 (the ship dtype; detection has no bandwidth-bound decode loop, so fp16 is no faster on the GPU and only adds near-tie noise). 36 MB. Same bundle on macOS and iOS.

Graph contract

input  "image"  [1,3,640,640] f32   BGR, 0-255, letterboxed (pad 114, top-left) — YOLOX-native (no /255, no mean/std)
output "preds"  [1,8400,85]   f32   [cx,cy,w,h, obj, cls_0..cls_79]; box DECODED to 640-px, obj/cls SIGMOID-ed (in-graph)

Host post-process: score = obj · max_class, threshold, per-class NMS (IoU 0.45), then un-letterbox the survivors. Anchors A = 80² + 40² + 20² = 8400 (strides 8/16/32).

Parity & speed (measured)

vs torch fp32: head cosine 1.000000, end-to-end detections IoU 1.000 on CPU and GPU.
M4 Max GPU: 4.80 ms / 208 FPS (median). M4 Max CPU 57 ms.
iPhone 17 Pro (Release, GPU, live camera): ~22 ms / 35–40 FPS end-to-end; first-load on-device specialization ~2.6 s (no AOT). The on-device gate reproduces the Mac fp32 oracle 6/6 (cat 0.96/0.96, remote 0.86/0.86, bed 0.71, couch 0.54).

Use (CoreAIKit)

import CoreAIKitVision
let detector = try await YOLOXDetector(model: .yoloxS)            // downloads this repo
let detections = try await detector.detect(in: pixelBuffer, scoreThreshold: 0.3)

Live-camera + video reference app: DetectCamera in coreai-kit.

Convert it yourself

conversion/export_yolox.py — --variant s --yolox-repo <YOLOX checkout> --weights yolox_s.pth, gated end-to-end with --verify-image <img> --unit {cpu,gpu}. The script also maps nano/tiny/m/l/x.

License

Apache-2.0 — upstream YOLOX code and COCO-pretrained weights are Apache-2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track