{"id":229698,"date":"2026-03-31T08:01:00","date_gmt":"2026-03-31T12:01:00","guid":{"rendered":"https:\/\/news-you-need.com\/index.php\/2026\/03\/31\/ai-benchmarks-are-broken-heres-what-we-need-instead\/"},"modified":"2026-03-31T08:01:00","modified_gmt":"2026-03-31T12:01:00","slug":"ai-benchmarks-are-broken-heres-what-we-need-instead","status":"publish","type":"post","link":"https:\/\/news-you-need.com\/index.php\/2026\/03\/31\/ai-benchmarks-are-broken-heres-what-we-need-instead\/","title":{"rendered":"AI benchmarks are broken. Here\u2019s what we need instead."},"content":{"rendered":"<p><a href=\"https:\/\/www.technologyreview.com\/2026\/03\/31\/1134833\/ai-benchmarks-are-broken-heres-what-we-need-instead\/amp\/\">AI benchmarks are broken. Here\u2019s what we need instead.<\/a><\/p>\n<p><a href=\"https:\/\/www.technologyreview.com\/2026\/03\/31\/1134833\/ai-benchmarks-are-broken-heres-what-we-need-instead\/amp\/\">https:\/\/www.technologyreview.com\/2026\/03\/31\/1134833\/ai-benchmarks-are-broken-heres-what-we-need-instead\/amp\/<\/a><\/p>\n<p>Publish Date: <a href=\"publish_date]\">2026-03-31 08:01:00<\/a><\/p>\n<p>Source Domain: <a href=\"www.technologyreview.com\">www.technologyreview.com<\/a><\/p>\n<p>For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from coding to essay writing, the performance of AI models and applications is tested against that of individual humans completing tasks.\u00a0<\/p>\n<p>This framing is seductive: An AI vs. human comparison on isolated problems with clear right or wrong answers is easy to standardize, compare, and optimize. It generates rankings and headlines.\u00a0\t\t\t<\/p>\n<p>But there\u2019s a problem: AI is almost never used in the way it is benchmarked. Although \u00a0 researchers and industry have started to improve benchmarking by moving beyond static tests to more dynamic evaluation methods, these\u00a0 innovations resolve only part of the issue. That\u2019s because they still evaluate AI\u2019s performance outside the human teams and organizational workflows where its real-world performance ultimately unfolds.\u00a0\t\t<\/p>\n<p>\t\t<span>This story is only available to subscribers.<\/span><\/p>\n<p>Don\u2019t settle for half the story.<br \/>Get paywall-free access to technology news for the here and now.\t\t<\/p>\n<p>\t\tSubscribe now<br \/>\n\t\t<span>Already a subscriber?<br \/>\n\t\t\tSign in<br \/>\n\t\t<\/span><\/p>\n<p>While AI is evaluated at the task level in a vacuum, it is used in messy, complex environments where it usually interacts with more than one person. Its performance (or lack thereof) emerges only over extended periods of use. This misalignment leaves us misunderstanding AI\u2019s capabilities, overlooking systemic risks, and misjudging its economic and social consequences.<\/p>\n<p>To mitigate this, it\u2019s time to shift from narrow methods to benchmarks that assess how AI systems perform over longer time horizons within human teams, workflows, and organizations. I have studied real-world AI deployment since 2022 in small businesses and health, humanitarian, nonprofit, and higher-education organizations in the UK, the United States, and Asia, as well as within leading AI design ecosystems in London and Silicon Valley&#8230;.<\/p>\n<p><a href=\"https:\/\/www.technologyreview.com\/2026\/03\/31\/1134833\/ai-benchmarks-are-broken-heres-what-we-need-instead\/amp\/\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI benchmarks are broken. Here\u2019s what we need instead. https:\/\/www.technologyreview.com\/2026\/03\/31\/1134833\/ai-benchmarks-are-broken-heres-what-we-need-instead\/amp\/ Publish Date: 2026-03-31 08:01:00 Source&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","footnotes":""},"categories":[14],"tags":[20],"class_list":["post-229698","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","tag-artificial-intelligence"],"_links":{"self":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/229698"}],"collection":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/comments?post=229698"}],"version-history":[{"count":0,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/229698\/revisions"}],"wp:attachment":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/media?parent=229698"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/categories?post=229698"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/tags?post=229698"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}