ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations Paper • 2606.11188 • Published 2 days ago • 21
WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark Paper • 2606.06538 • Published 7 days ago • 1
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations Paper • 2606.05563 • Published 7 days ago • 47
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration Paper • 2606.04743 • Published 8 days ago • 43
view article Article The Open Source Community is backing OpenEnv for Agentic RL +15 burtenshaw, spisakjo, lysandre, darktex, willcb, qjoy, pawalt, cwing-nv, danielhanchen, andrewzhou, shimmyshimmer, Hamid-Nazeri, Sanyam, zkwentz, emre0, lewtun, sergiopaniego • 3 days ago • 69
📝 Research & Long-Form Blog Posts Collection In-depth technical articles and research pieces published by Hugging Face • 18 items • Updated 13 days ago • 31
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills Paper • 2606.07412 • Published 6 days ago • 12
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents Paper • 2606.05806 • Published 7 days ago • 21
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 7 days ago • 40
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses Paper • 2606.02373 • Published 10 days ago • 51
X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding Paper • 2606.02482 • Published 10 days ago • 35
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents Paper • 2606.04703 • Published 8 days ago • 22