Instructions to use WithinUsAI/Infinite.Code.III with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WithinUsAI/Infinite.Code.III with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WithinUsAI/Infinite.Code.III", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("WithinUsAI/Infinite.Code.III", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WithinUsAI/Infinite.Code.III with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WithinUsAI/Infinite.Code.III"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Infinite.Code.III",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/WithinUsAI/Infinite.Code.III

SGLang

How to use WithinUsAI/Infinite.Code.III with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WithinUsAI/Infinite.Code.III" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Infinite.Code.III",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WithinUsAI/Infinite.Code.III" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Infinite.Code.III",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use WithinUsAI/Infinite.Code.III with Docker Model Runner:
```
docker model run hf.co/WithinUsAI/Infinite.Code.III
```

Infinite.Code.III / README.md

GODsStrongestSoldier

Add README.md

766e5c8 verified 4 days ago

preview code

raw

history blame contribute delete

4.75 kB

metadata

language:
  - en
  - code
license: apache-2.0
tags:
  - recursive-language-model
  - causal-lm
  - multimodal
  - long-context
  - mixture-of-experts
  - continual-learning
  - meta-learning
  - self-automated
  - safetensors
  - pytorch
model_name: Infinite.Code.III
pipeline_tag: text-generation
library_name: transformers

Infinite.Code.III — Recursive Language Model

"Not a Large Language Model. A Recursive Mind."

Overview

Infinite.Code.III is a 1.210B-parameter Recursive Language Model (RLM) built from scratch as a unified Hybrid Mind architecture. Unlike standard LLMs that apply a fixed forward-pass transformer, Infinite.Code.III integrates Self-Automated (S.A.) learning systems as architectural primitives — they are not pipeline steps; they are woven into every decoder layer.

Property	Value
Parameters	1.210B
Context Window	1,000,000 tokens
Architecture	Recursive Language Model (RLM)
Attention	Grouped-Query Attention (GQA) 10/5 heads
Positional Encoding	RoPE (θ = 500,000, long-ctx scaled)
FFN	Alternating Dense / Mixture-of-Experts (8 experts, top-2)
Vocabulary	65,536 BPE tokens
Layers	20
Hidden Size	1280
Weight Format	safetensors (bfloat16 trained, float32 saved)
Modalities	Text · Image · Audio · Video
License	Apache 2.0

S.A. System Architecture

S.A. Meta Learning

Each layer has a learnable adaptive_alpha scalar (sigmoid-gated) that blends the transformed output with the layer's top-of-layer residual. This is the meta-learning channel — it learns how much each transformation contributes per layer.

S.A. Reinforcement Learning

RewardHead (D → 512 → 1 scalar) attaches to the final hidden states. During RL fine-tuning (RLHF / GRPO), this head provides the value signal. Pass output_reward=True during rollout collection.

S.A. Continual Learning

HybridMemory LTM uses exponential moving average write-back (0.95 × old + 0.05 × new) — knowledge accumulates across forward passes without overwriting, resisting catastrophic forgetting.

S.A. Adaptive Learning

The per-layer adaptive_alpha gate is trained end-to-end, self-calibrating each layer's write strength to the residual stream.

S.A. Rewriting Learning

Every 3rd layer runs RewriteAttention — a 4-head causal self-attention pass that lets the model revise its own intermediate token representations within a single forward pass.

S.A. NLP + S.A. Problem Solving

MetaOutputMixer at decoder output applies a 3-way soft gate (language / code / math-logic) via NLPGate. The final representation is a content-adaptive weighted mixture of three parallel projections.

S.A. Innovation Learning

Odd-numbered layers use MoELayer — 8 experts, top-2 routing, each a SwiGLU FFN with 2048-dim intermediate.

S.A. DeBugging

DebugHookManager gradient hook registry. Set debug_mode: true in config to activate mean-absolute-gradient logging on the embedding and any registered tensor. Zero cost when disabled.

S.A. Advanced Long/Short-Term Memory

HybridMemory (every 4th layer):

STM: 512-slot soft-attention read buffer (refreshed each pass)
LTM: 2048-slot persistent EMA key-value store (continual write-back)

S.A. Recursive Seed Learning

RecursiveSeedGate on every layer — depth-4 intra-layer recursion: seeds a 256-dim vector, projects to full D, gates with sigmoid, re-seeds from updated h. Creates true within-layer feedback loops.

Multimodal Inputs

Modality	Projector	Input Shape
Image	`ImageProjector` Linear(1024→2560→1280)	`(B, N_patches, 1024)`
Audio	`AudioProjector` GRU(80→512) + Linear	`(B, T_frames, 80)`
Video	`VideoProjector` Linear + TransformerEncoderLayer	`(B, F_frames, 1024)`

Fine-Tuning

SFT Recommended Hyperparameters

Setting	Value
Learning Rate	2e-5
LR Schedule	cosine + 100-step warmup
Batch Size	1–4 per GPU + grad accumulation ×8
Max Seq Length	start at 8192, scale to 1M
Precision	bfloat16
Optimizer	AdamW (β₁=0.9, β₂=0.95, ε=1e-8, wd=0.1)
Grad Clip	1.0

RLHF / GRPO

The reward_head is the built-in value model. Pass output_reward=True during rollout. The scalar is differentiable — plug directly into TRL GRPOTrainer.

Citation

@misc{infinite_code_iii_2025,
  title   = {Infinite.Code.III: A Recursive Language Model with Self-Automated Learning},
  author  = {GODsStrongestSoldier},
  year    = {2025},
  url     = {https://huggingface.co/GODsStrongestSoldier/Infinite.Code.III},
  note    = {1.210B Recursive Language Model, 1M context window}
}