Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 6 days ago • 50
VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding Paper • 2606.05259 • Published 4 days ago • 34
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems Paper • 2605.04018 • Published May 5 • 40
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company Paper • 2604.22446 • Published Apr 24 • 121
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction Paper • 2604.22880 • Published Apr 24 • 10
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Paper • 2507.13300 • Published Jul 17, 2025 • 20
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers Paper • 2507.02694 • Published Jul 3, 2025 • 19
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure Paper • 2506.12278 • Published Jun 13, 2025 • 16
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Paper • 2505.22653 • Published May 28, 2025 • 43
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29, 2025 • 69
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published May 21, 2025 • 56
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging Paper • 2503.22236 • Published Mar 28, 2025 • 11