YOLOX-S β€” Core AI

YOLOX (Megvii, Apache-2.0) converted to Apple Core AI (.aimodel) β€” a single-stage anchor-free object detector running as one static graph on every Apple compute unit (Mac GPU / iPhone GPU / Neural Engine). Part of the Core AI model zoo (model card).

The dense-detector counterpart to RF-DETR-CoreAI: where the DETR family needs no NMS, YOLOX is the classic score = obj Β· cls + per-class NMS pipeline.

Use it

▢️ Run it (source) β€” the DetectCamera runner (real-time object detection on the zero-copy camera path):

git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/DetectCamera/DetectCamera.xcodeproj
# β†’ Run, then pick "YOLOX" in the model picker

# agents / headless (macOS):
cd coreai-kit/Examples/DetectCamera
swift run detect-cli --model yolox-s --image Resources/gate_image.jpg

πŸ’» Build with it β€” complete; the glue is kit API, copy-paste runs:

import CoreAIKitVision

let detector = try await KitDetector(catalog: "yolox-s")
let image = try ImageFile.load(imageURL)  // any image file β†’ CGImage + EXIF orientation
let detections = try await detector.detect(in: image.cgImage)
// detections: [Detection] β€” label, score, normalized box (top-left origin)

The take-home is Examples/DetectCamera/Sources/QuickStart.swift β€” this exact code as one typed function, no UI; the CLI is an argument shell over it, and the GUI runs the same detector per camera frame on a zero-copy pixel-buffer fast path. YOLOX is a dense detector β€” KitDetector runs the objΒ·cls threshold + per-class NMS host-side; the DETR family needs none. Same detect(in:) either way.

Integration checklist

  • SPM: https://github.com/john-rocky/coreai-kit β†’ product CoreAIKitVision
  • Info.plist: NSCameraUsageDescription β€” only for the live camera; the snippet needs none
  • Entitlements: none needed
  • First run downloads the model β€” 0.0 GB (Mac) / 0.0 GB (iPhone) β€” then it loads from the local cache (Application Support; progress via the downloadProgress callback)
  • Measure in Release β€” Debug is ~3Γ— slower on per-token host work

Bundle

  • yolox-s_float32.aimodel β€” YOLOX-S, 640Β² input, 8.97M params, fp32 (the ship dtype; detection has no bandwidth-bound decode loop, so fp16 is no faster on the GPU and only adds near-tie noise). 36 MB. Same bundle on macOS and iOS.

Graph contract

input  "image"  [1,3,640,640] f32   BGR, 0-255, letterboxed (pad 114, top-left) β€” YOLOX-native (no /255, no mean/std)
output "preds"  [1,8400,85]   f32   [cx,cy,w,h, obj, cls_0..cls_79]; box DECODED to 640-px, obj/cls SIGMOID-ed (in-graph)

Host post-process: score = obj Β· max_class, threshold, per-class NMS (IoU 0.45), then un-letterbox the survivors. Anchors A = 80Β² + 40Β² + 20Β² = 8400 (strides 8/16/32).

Parity & speed (measured)

  • vs torch fp32: head cosine 1.000000, end-to-end detections IoU 1.000 on CPU and GPU.
  • M4 Max GPU: 4.80 ms / 208 FPS (median). M4 Max CPU 57 ms.
  • iPhone 17 Pro (Release, GPU, live camera): ~22 ms / 35–40 FPS end-to-end; first-load on-device specialization ~2.6 s (no AOT). The on-device gate reproduces the Mac fp32 oracle 6/6 (cat 0.96/0.96, remote 0.86/0.86, bed 0.71, couch 0.54).

Use (CoreAIKit)

import CoreAIKitVision
let detector = try await YOLOXDetector(model: .yoloxS)            // downloads this repo
let detections = try await detector.detect(in: pixelBuffer, scoreThreshold: 0.3)

Live-camera + video reference app: DetectCamera in coreai-kit.

Convert it yourself

conversion/export_yolox.py β€” --variant s --yolox-repo <YOLOX checkout> --weights yolox_s.pth, gated end-to-end with --verify-image <img> --unit {cpu,gpu}. The script also maps nano/tiny/m/l/x.

License

Apache-2.0 β€” upstream YOLOX code and COCO-pretrained weights are Apache-2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support