ποΈ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 25 days ago β’ 38
Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 27 days ago β’ 12
FINAL Bench World's First Functional Metacognition Benchmark. "Not how much AI knows β but whether it knows what it doesn't know, and can fix it." FINAL-Bench/Metacognitive Viewer β’ Updated Feb 27 β’ 100 β’ 1.37k β’ 76 Running Featured 43 Leaderboard - FINAL Bench 'Metacognitive' π 43 Metacognitive
FINAL Bench World's First Functional Metacognition Benchmark. "Not how much AI knows β but whether it knows what it doesn't know, and can fix it." FINAL-Bench/Metacognitive Viewer β’ Updated Feb 27 β’ 100 β’ 1.37k β’ 76 Running Featured 43 Leaderboard - FINAL Bench 'Metacognitive' π 43 Metacognitive