view article Article DABStep: Data Agent Benchmark for Multi-step Reasoning +5 eggie5, martinigoyanes, frisokingma, andreumora, lvwerra, thomwolf, m-ric • Feb 4, 2025 • 131
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 164
Running Agents 232 AI2 WildBench Leaderboard (V2) 🦁 232 Display LLM performance leaderboards with customizable views
Group-in-Group Policy Optimization for LLM Agent Training Paper • 2505.10978 • Published May 16, 2025 • 23