Infinite.Code.III / README.md
GODsStrongestSoldier's picture
Add README.md
766e5c8 verified
metadata
language:
  - en
  - code
license: apache-2.0
tags:
  - recursive-language-model
  - causal-lm
  - multimodal
  - long-context
  - mixture-of-experts
  - continual-learning
  - meta-learning
  - self-automated
  - safetensors
  - pytorch
model_name: Infinite.Code.III
pipeline_tag: text-generation
library_name: transformers

Infinite.Code.III — Recursive Language Model

"Not a Large Language Model. A Recursive Mind."

Overview

Infinite.Code.III is a 1.210B-parameter Recursive Language Model (RLM) built from scratch as a unified Hybrid Mind architecture. Unlike standard LLMs that apply a fixed forward-pass transformer, Infinite.Code.III integrates Self-Automated (S.A.) learning systems as architectural primitives — they are not pipeline steps; they are woven into every decoder layer.

Property Value
Parameters 1.210B
Context Window 1,000,000 tokens
Architecture Recursive Language Model (RLM)
Attention Grouped-Query Attention (GQA) 10/5 heads
Positional Encoding RoPE (θ = 500,000, long-ctx scaled)
FFN Alternating Dense / Mixture-of-Experts (8 experts, top-2)
Vocabulary 65,536 BPE tokens
Layers 20
Hidden Size 1280
Weight Format safetensors (bfloat16 trained, float32 saved)
Modalities Text · Image · Audio · Video
License Apache 2.0

S.A. System Architecture

S.A. Meta Learning

Each layer has a learnable adaptive_alpha scalar (sigmoid-gated) that blends the transformed output with the layer's top-of-layer residual. This is the meta-learning channel — it learns how much each transformation contributes per layer.

S.A. Reinforcement Learning

RewardHead (D → 512 → 1 scalar) attaches to the final hidden states. During RL fine-tuning (RLHF / GRPO), this head provides the value signal. Pass output_reward=True during rollout collection.

S.A. Continual Learning

HybridMemory LTM uses exponential moving average write-back (0.95 × old + 0.05 × new) — knowledge accumulates across forward passes without overwriting, resisting catastrophic forgetting.

S.A. Adaptive Learning

The per-layer adaptive_alpha gate is trained end-to-end, self-calibrating each layer's write strength to the residual stream.

S.A. Rewriting Learning

Every 3rd layer runs RewriteAttention — a 4-head causal self-attention pass that lets the model revise its own intermediate token representations within a single forward pass.

S.A. NLP + S.A. Problem Solving

MetaOutputMixer at decoder output applies a 3-way soft gate (language / code / math-logic) via NLPGate. The final representation is a content-adaptive weighted mixture of three parallel projections.

S.A. Innovation Learning

Odd-numbered layers use MoELayer — 8 experts, top-2 routing, each a SwiGLU FFN with 2048-dim intermediate.

S.A. DeBugging

DebugHookManager gradient hook registry. Set debug_mode: true in config to activate mean-absolute-gradient logging on the embedding and any registered tensor. Zero cost when disabled.

S.A. Advanced Long/Short-Term Memory

HybridMemory (every 4th layer):

  • STM: 512-slot soft-attention read buffer (refreshed each pass)
  • LTM: 2048-slot persistent EMA key-value store (continual write-back)

S.A. Recursive Seed Learning

RecursiveSeedGate on every layer — depth-4 intra-layer recursion: seeds a 256-dim vector, projects to full D, gates with sigmoid, re-seeds from updated h. Creates true within-layer feedback loops.


Multimodal Inputs

Modality Projector Input Shape
Image ImageProjector Linear(1024→2560→1280) (B, N_patches, 1024)
Audio AudioProjector GRU(80→512) + Linear (B, T_frames, 80)
Video VideoProjector Linear + TransformerEncoderLayer (B, F_frames, 1024)

Fine-Tuning

SFT Recommended Hyperparameters

Setting Value
Learning Rate 2e-5
LR Schedule cosine + 100-step warmup
Batch Size 1–4 per GPU + grad accumulation ×8
Max Seq Length start at 8192, scale to 1M
Precision bfloat16
Optimizer AdamW (β₁=0.9, β₂=0.95, ε=1e-8, wd=0.1)
Grad Clip 1.0

RLHF / GRPO

The reward_head is the built-in value model. Pass output_reward=True during rollout. The scalar is differentiable — plug directly into TRL GRPOTrainer.


Citation

@misc{infinite_code_iii_2025,
  title   = {Infinite.Code.III: A Recursive Language Model with Self-Automated Learning},
  author  = {GODsStrongestSoldier},
  year    = {2025},
  url     = {https://huggingface.co/GODsStrongestSoldier/Infinite.Code.III},
  note    = {1.210B Recursive Language Model, 1M context window}
}