{"id":207949,"date":"2026-01-29T18:54:00","date_gmt":"2026-01-29T23:54:00","guid":{"rendered":"https:\/\/news-you-need.com\/index.php\/2026\/01\/29\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence\/"},"modified":"2026-01-29T19:00:08","modified_gmt":"2026-01-30T00:00:08","slug":"ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence","status":"publish","type":"post","link":"https:\/\/news-you-need.com\/index.php\/2026\/01\/29\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence\/","title":{"rendered":"AI is failing \u2018Humanity\u2019s Last Exam\u2019. So what does that mean for machine intelligence?"},"content":{"rendered":"<p><a href=\"https:\/\/theconversation.com\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence-274620\">AI is failing \u2018Humanity\u2019s Last Exam\u2019. So what does that mean for machine intelligence?<\/a><\/p>\n<p><a href=\"https:\/\/theconversation.com\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence-274620\">https:\/\/theconversation.com\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence-274620<\/a><\/p>\n<p>Publish Date: <a href=\"publish_date]\">2026-01-29 18:54:00<\/a><\/p>\n<p>Source Domain: <a href=\"theconversation.com\">theconversation.com<\/a><\/p>\n<p>How do you translate ancient Palmyrene script from a Roman tombstone? How many paired tendons are supported by a specific sesamoid bone in a hummingbird? Can you identify closed syllables in Biblical Hebrew based on the latest scholarship on Tiberian pronunciation traditions?<\/p>\n<p>These are some of the questions in \u201cHumanity\u2019s Last Exam\u201d, a new benchmark introduced in a study published this week in Nature. The collection of 2,500 questions is specifically designed to probe the outer limits of what today\u2019s artificial intelligence (AI) systems cannot do.<\/p>\n<p>The benchmark represents a global collaboration of nearly 1,000 international experts across a range of academic fields. These academics and researchers contributed questions at the frontier of human knowledge. The problems required graduate-level expertise in mathematics, physics, chemistry, biology, computer science and the humanities. Importantly, every question was tested against leading AI models before inclusion. If an AI could not answer it correctly at the time the test was designed, the question was rejected.<\/p>\n<p>This process explains why the initial results looked so different from other benchmarks. While AI chatbots score above 90% on popular tests, when Humanity\u2019s Last Exam was first released in early 2025, leading models struggled badly. GPT-4o managed just 2.7% accuracy. Claude 3.5 Sonnet scored 4.1%. Even OpenAI\u2019s most powerful model, o1, achieved only 8%.<\/p>\n<p>The low scores were the point. The benchmark was constructed to measure what remained beyond AI\u2019s grasp. And while some commentators have suggested that benchmarks like Humanity\u2019s Last Exam chart a path toward artificial general intelligence, or even superintelligence \u2013 that is, AI systems capable of performing any task at human or superhuman levels \u2013 we believe this is wrong for three reasons. <\/p>\n<h2>Benchmarks measure task performance, not intelligence<\/h2>\n<p>When a student scores well on the bar exam, we can reasonably predict&#8230;<\/p>\n<p><a href=\"https:\/\/theconversation.com\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence-274620\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI is failing \u2018Humanity\u2019s Last Exam\u2019. So what does that mean for machine intelligence? https:\/\/theconversation.com\/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence-274620&#8230;<\/p>\n","protected":false},"author":1,"featured_media":207950,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/images.theconversation.com\/files\/715368\/original\/file-20260129-56-1ytprm.jpg?ixlib=rb-4.1.0&rect=8%2C148%2C3461%2C1730&q=45&auto=format&w=1356&h=668&fit=crop","fifu_image_alt":"","footnotes":""},"categories":[14],"tags":[22,20],"class_list":["post-207949","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","tag-artificial-general-intelligence","tag-artificial-intelligence"],"_links":{"self":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/207949"}],"collection":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/comments?post=207949"}],"version-history":[{"count":1,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/207949\/revisions"}],"predecessor-version":[{"id":207951,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/207949\/revisions\/207951"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/media\/207950"}],"wp:attachment":[{"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/media?parent=207949"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/categories?post=207949"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/news-you-need.com\/index.php\/wp-json\/wp\/v2\/tags?post=207949"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}