CodeRankEmbed-f16

An f16 (half-precision) cast of nomic-ai/CodeRankEmbed โ€” the 137M NomicBert bi-encoder for code retrieval โ€” in safetensors, for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half the memory of the f32 base.

This repo is weights only, identical architecture: every tensor is the base model cast f32 โ†’ f16, tensor names/shapes unchanged. Use it exactly like the base model (same config.json, tokenizer.json, CLS pooling, and the required query instruction prefix).

Why

The base repo ships f32 safetensors (~547 MB). On the Metal GPU the f16 weights halve the working set and matmul bandwidth with no change to retrieval quality, so it is the form used by embedding-search on Apple Silicon.

Validation (f16 vs f32, CodeSearchNet Python, N=300)

Same code/corpus, dtype the only difference:

dtype peak RSS MRR@10 Recall@1
f32 (base) 1116 MB 0.9573 0.9367
f16 (this) 570 MB 0.9573 0.9367
  • cosine(f16, f32) per-document: mean 0.999998, min 0.999996
  • top-1 retrieval agreement f16 vs f32: 1.0000
  • MRR@10 / Recall@1 deltas: 0.0000

f16 is numerically a no-op for retrieval at about half the RAM. (The absolute MRR is high because the eval uses a small 300-doc distractor pool โ€” it is an f16-vs-f32 parity check, not a full-CodeSearchNet reproduction of the base model's published score.)

Usage

The query must use the task instruction prefix (same as the base model); code/documents get no prefix:

Represent this query for searching relevant code: <your query>

CLS-pool the last hidden state and L2-normalize; cosine similarity for ranking.

Provenance & license

Produced by a pure dtype cast (CPU, candle) of nomic-ai/CodeRankEmbed model.safetensors; config.json and tokenizer.json copied unchanged. Inherits the base model's MIT license. Credit and citation belong to the original authors โ€” see the base model card and the CoRNStack paper (arXiv:2412.01007).

Downloads last month
34
Safetensors
Model size
0.1B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sensiarion/CodeRankEmbed-f16

Quantized
(13)
this model

Paper for sensiarion/CodeRankEmbed-f16