Let LLMs Break Free from Overthinking via Self-Braking Tuning Paper • 2505.14604 • Published May 20, 2025 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios Paper • 2505.16944 • Published May 22, 2025 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools Paper • 2505.15960 • Published May 21, 2025 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning Paper • 2505.15134 • Published May 21, 2025 • 7
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20, 2025 • 24
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization Paper • 2505.13430 • Published May 19, 2025 • 10
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Paper • 2505.14681 • Published May 20, 2025 • 10
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations Paper • 2505.18125 • Published May 23, 2025 • 112
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization Paper • 2505.18092 • Published May 23, 2025 • 43
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning Paper • 2505.16022 • Published May 21, 2025 • 4
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper • 2505.19640 • Published May 26, 2025 • 15
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective Paper • 2505.17652 • Published May 23, 2025 • 6
UFT: Unifying Supervised and Reinforcement Fine-Tuning Paper • 2505.16984 • Published May 22, 2025 • 3
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs Paper • 2505.19075 • Published May 25, 2025 • 22
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment Paper • 2505.11821 • Published May 17, 2025 • 14
Text2Grad: Reinforcement Learning from Natural Language Feedback Paper • 2505.22338 • Published May 28, 2025 • 8
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28, 2025 • 132
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published Jun 3, 2025 • 17
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists Paper • 2506.01241 • Published Jun 2, 2025 • 9
Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward Paper • 2506.05433 • Published Jun 5, 2025 • 4
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Paper • 2506.08672 • Published Jun 10, 2025 • 30
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30, 2025 • 282
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning Paper • 2504.21370 • Published Apr 30, 2025 • 2
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2, 2025 • 61
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published Jul 21, 2025 • 21
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management Paper • 2508.04664 • Published Aug 6, 2025 • 13
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study Paper • 2508.09776 • Published Aug 13, 2025 • 3
Aryabhata: An exam-focused language model for JEE Math Paper • 2508.08665 • Published Aug 12, 2025 • 15
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14, 2025 • 29
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems Paper • 2508.07407 • Published Aug 10, 2025 • 99
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11, 2025 • 50
If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition Paper • 2508.16838 • Published Aug 22, 2025 • 1
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published Aug 23, 2025 • 24
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 81
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 21
zELO: ELO-inspired Training Method for Rerankers and Embedding Models Paper • 2509.12541 • Published Sep 16, 2025 • 11
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18, 2025 • 33
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering Paper • 2509.25175 • Published Sep 29, 2025 • 31
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs Paper • 2509.23196 • Published Sep 27, 2025 • 9
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
MemMamba: Rethinking Memory Patterns in State Space Model Paper • 2510.03279 • Published Sep 28, 2025 • 74
E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker Paper • 2510.22733 • Published Oct 26, 2025 • 32
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning Paper • 2510.18940 • Published Oct 21, 2025 • 9
Automating Database-Native Function Code Synthesis with LLMs Paper • 2604.06231 • Published Apr 2 • 17
Source or It Didn't Happen: A Multi-Agent Framework for Citation Hallucination Detection Paper • 2605.08583 • Published May 9 • 2