Close Menu
geekfence.comgeekfence.com
    What's Hot

    World ID expands its ‘proof of human’ vision for the AI era – Computerworld

    April 19, 2026

    Francis Bacon and the Scientific Method

    April 19, 2026

    War in the Middle East, Damaged Data Centers, and Cloud Disruptions

    April 19, 2026
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»Artificial Intelligence»This is the most misunderstood graph in AI
    Artificial Intelligence

    This is the most misunderstood graph in AI

    AdminBy AdminFebruary 5, 2026No Comments3 Mins Read5 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    This is the most misunderstood graph in AI
    Share
    Facebook Twitter LinkedIn Pinterest Email


    That was certainly the case for Claude Opus 4.5, the latest version of Anthropic’s most powerful model, which was released in late November. In December, METR announced that Opus 4.5 appeared to be capable of independently completing a task that would have taken a human about five hours—a vast improvement over what even the exponential trend would have predicted. One Anthropic safety researcher tweeted that he would change the direction of his research in light of those results; another employee at the company simply wrote, “mom come pick me up i’m scared.”

    Credit: METR.ORG

    But the truth is more complicated than those dramatic responses would suggest. For one thing, METR’s estimates of the abilities of specific models come with substantial error bars. As METR explicitly stated on X, Opus 4.5 might be able to regularly complete only tasks that take humans about two hours, or it might succeed on tasks that take humans as long as 20 hours. Given the uncertainties intrinsic to the method, it was impossible to know for sure. 

    “There are a bunch of ways that people are reading too much into the graph,” says Sydney Von Arx, a member of METR’s technical staff.

    More fundamentally, the METR plot does not measure AI abilities writ large, nor does it claim to. In order to build the graph, METR tests the models primarily on coding tasks, evaluating the difficulty of each by measuring or estimating how long it takes humans to complete it—a metric that not everyone accepts. Claude Opus 4.5 might be able to complete certain tasks that take humans five hours, but that doesn’t mean it’s anywhere close to replacing a human worker.

    METR was founded to assess the risks posed by frontier AI systems. Though it is best known for the exponential trend plot, it has also worked with AI companies to evaluate their systems in greater detail and published several other independent research projects, including a widely covered July 2025 study suggesting that AI coding assistants might actually be slowing software engineers down. 

    But the exponential plot has made METR’s reputation, and the organization appears to have a complicated relationship with that graph’s often breathless reception. In January, Thomas Kwa, one of the lead authors on the paper that introduced it, wrote a blog post responding to some criticisms and making clear its limitations, and METR is currently working on a more extensive FAQ document. But Kwa isn’t optimistic that these efforts will meaningfully shift the discourse. “I think the hype machine will basically, whatever we do, just strip out all the caveats,” he says.

    Nevertheless, the METR team does think that the plot has something meaningful to say about the trajectory of AI progress. “You should absolutely not tie your life to this graph,” says Von Arx. “But also,” she adds, “I bet that this trend is gonna hold.”



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    How Much Coding Is Required To Work in AI and LLM-related Jobs?

    April 19, 2026

    Posit AI Blog: Implementing rotation equivariance: Group-equivariant CNN from scratch

    April 18, 2026

    The Download: bad news for inner Neanderthals, and AI warfare’s human illusion

    April 17, 2026

    AI Is Writing Our Code Faster Than We Can Verify It – O’Reilly

    April 16, 2026

    Measuring and bridging the realism gap in user simulators

    April 15, 2026

    Tune in on Thursday for Xbox First Look: Metro 2039

    April 14, 2026
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 202530 Views

    Redefining AI efficiency with extreme compression

    March 25, 202625 Views

    Hard-braking events as indicators of road segment crash risk

    January 14, 202625 Views
    Don't Miss

    World ID expands its ‘proof of human’ vision for the AI era – Computerworld

    April 19, 2026

    How ‘proof of human’ works Billed as the infrastructure for the age of AI, World…

    Francis Bacon and the Scientific Method

    April 19, 2026

    War in the Middle East, Damaged Data Centers, and Cloud Disruptions

    April 19, 2026

    How Much Coding Is Required To Work in AI and LLM-related Jobs?

    April 19, 2026
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    World ID expands its ‘proof of human’ vision for the AI era – Computerworld

    April 19, 2026

    Francis Bacon and the Scientific Method

    April 19, 2026

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2026 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.