Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
RewardHarness: Self-Evolving Agentic Post-Training
ClawBench: Can AI Agents Complete Everyday Online Tasks?