Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published Jan 20, 2025 • 9
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation Paper • 2502.05178 • Published Feb 7, 2025 • 10
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18, 2025 • 8
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4, 2025 • 19
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published Jan 21 • 22
Fully Attentional Networks with Self-emerging Token Labeling Paper • 2401.03844 • Published Jan 8, 2024
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought Paper • 2505.23766 • Published May 29, 2025
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 55
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation Paper • 2406.06978 • Published Jun 11, 2024
Centaur: Robust End-to-End Autonomous Driving with Test-Time Training Paper • 2503.11650 • Published Mar 14, 2025
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model Paper • 2507.05513 • Published Jul 7, 2025 • 1
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight Paper • 2511.20648 • Published Nov 25, 2025 • 1
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning Paper • 2405.01533 • Published May 2, 2024 • 1
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding Paper • 2507.13353 • Published Jul 17, 2025 • 2
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training Paper • 2507.12507 • Published Jul 16, 2025 • 1
Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training Paper • 2503.12030 • Published Mar 15, 2025