YOLOX-S β Core AI
YOLOX (Megvii, Apache-2.0) converted to
Apple Core AI (.aimodel) β a single-stage anchor-free object detector running as
one static graph on every Apple compute unit (Mac GPU / iPhone GPU / Neural Engine). Part
of the Core AI model zoo
(model card).
The dense-detector counterpart to
RF-DETR-CoreAI: where the DETR family
needs no NMS, YOLOX is the classic score = obj Β· cls + per-class NMS pipeline.
Use it
βΆοΈ Run it (source) β the DetectCamera runner (real-time object detection on the zero-copy camera path):
git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/DetectCamera/DetectCamera.xcodeproj
# β Run, then pick "YOLOX" in the model picker
# agents / headless (macOS):
cd coreai-kit/Examples/DetectCamera
swift run detect-cli --model yolox-s --image Resources/gate_image.jpg
π» Build with it β complete; the glue is kit API, copy-paste runs:
import CoreAIKitVision
let detector = try await KitDetector(catalog: "yolox-s")
let image = try ImageFile.load(imageURL) // any image file β CGImage + EXIF orientation
let detections = try await detector.detect(in: image.cgImage)
// detections: [Detection] β label, score, normalized box (top-left origin)
The take-home is Examples/DetectCamera/Sources/QuickStart.swift
β this exact code as one typed function, no UI; the CLI is an argument shell over it, and
the GUI runs the same detector per camera frame on a zero-copy pixel-buffer fast path.
YOLOX is a dense detector β KitDetector runs the objΒ·cls threshold + per-class NMS
host-side; the DETR family needs none. Same detect(in:) either way.
Integration checklist
- SPM:
https://github.com/john-rocky/coreai-kitβ product CoreAIKitVision - Info.plist:
NSCameraUsageDescriptionβ only for the live camera; the snippet needs none - Entitlements: none needed
- First run downloads the model β 0.0 GB (Mac) / 0.0 GB (iPhone) β then it loads from the
local cache (Application Support; progress via the
downloadProgresscallback) - Measure in Release β Debug is ~3Γ slower on per-token host work
Bundle
yolox-s_float32.aimodelβ YOLOX-S, 640Β² input, 8.97M params, fp32 (the ship dtype; detection has no bandwidth-bound decode loop, so fp16 is no faster on the GPU and only adds near-tie noise). 36 MB. Same bundle on macOS and iOS.
Graph contract
input "image" [1,3,640,640] f32 BGR, 0-255, letterboxed (pad 114, top-left) β YOLOX-native (no /255, no mean/std)
output "preds" [1,8400,85] f32 [cx,cy,w,h, obj, cls_0..cls_79]; box DECODED to 640-px, obj/cls SIGMOID-ed (in-graph)
Host post-process: score = obj Β· max_class, threshold, per-class NMS (IoU 0.45), then
un-letterbox the survivors. Anchors A = 80Β² + 40Β² + 20Β² = 8400 (strides 8/16/32).
Parity & speed (measured)
- vs torch fp32: head cosine 1.000000, end-to-end detections IoU 1.000 on CPU and GPU.
- M4 Max GPU: 4.80 ms / 208 FPS (median). M4 Max CPU 57 ms.
- iPhone 17 Pro (Release, GPU, live camera): ~22 ms / 35β40 FPS end-to-end; first-load on-device specialization ~2.6 s (no AOT). The on-device gate reproduces the Mac fp32 oracle 6/6 (cat 0.96/0.96, remote 0.86/0.86, bed 0.71, couch 0.54).
Use (CoreAIKit)
import CoreAIKitVision
let detector = try await YOLOXDetector(model: .yoloxS) // downloads this repo
let detections = try await detector.detect(in: pixelBuffer, scoreThreshold: 0.3)
Live-camera + video reference app: DetectCamera in coreai-kit.
Convert it yourself
conversion/export_yolox.py
β --variant s --yolox-repo <YOLOX checkout> --weights yolox_s.pth, gated end-to-end with
--verify-image <img> --unit {cpu,gpu}. The script also maps nano/tiny/m/l/x.
License
Apache-2.0 β upstream YOLOX code and COCO-pretrained weights are Apache-2.0.