Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning Paper • 2602.21103 • Published 19 days ago • 6
Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents Paper • 2605.25971 • Published 27 days ago • 16
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring Paper • 2605.16386 • Published May 11 • 3
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published Apr 30 • 58