TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation Paper • 2606.02320 • Published 5 days ago • 13
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 5 days ago • 49
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 5 days ago • 13
OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published 20 days ago • 31
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published 23 days ago • 22
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published about 1 month ago • 46
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios Paper • 2604.25914 • Published Apr 28 • 41
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published Apr 20 • 22
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published Apr 16 • 36
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction Paper • 2603.00610 • Published Feb 28 • 35
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations Paper • 2602.03828 • Published Feb 3 • 20