-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 118 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 30 • 157 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 107 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 19
Malkesh Dalia
malkesh2911
·
AI & ML interests
None yet
Recent Activity
upvoted an article about 7 hours ago
Welcome Gemma 4: Frontier multimodal intelligence on device upvoted a paper 2 days ago
Emergent Social Intelligence Risks in Generative Multi-Agent Systems upvoted a paper 3 days ago
Gen-Searcher: Reinforcing Agentic Search for Image GenerationOrganizations
None yet