Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models Paper • 2602.01849 • Published Feb 2 • 5
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome Paper • 2603.28407 • Published 10 days ago • 68
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 2 days ago • 22
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 2 days ago • 22
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9, 2025 • 24
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25, 2025 • 104
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale Paper • 2502.16684 • Published Feb 23, 2025 • 1
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9, 2025 • 18