CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 12 days ago • 4
Confidence-Aware Tool Orchestration for Robust Video Understanding Paper • 2606.26904 • Published 2 days ago • 6
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Paper • 2606.18394 • Published 2 days ago • 22
Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching Paper • 2606.24457 • Published 4 days ago • 3
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Paper • 2606.22565 • Published 6 days ago • 8
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems Paper • 2606.24937 • Published 5 days ago • 13
QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging Paper • 2606.20027 • Published 9 days ago • 2
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis Paper • 2602.09379 • Published 16 days ago • 22
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 4 days ago • 124
Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark Paper • 2606.18648 • Published 10 days ago • 14
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 6 days ago • 92
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 10 days ago • 61