Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours

Chinese AI company Z.ai has launched GLM-5.1, an open-source coding model it says is built for agentic software engineering. The release comes as AI vendors move beyond autocomplete-style coding tools toward systems that can handle software tasks over longer periods with less human input.

Z.ai said GLM-5.1 can sustain performance over hundreds of iterations, an ability it argues sets it apart from models that lose effectiveness in longer sessions.

As one example, the company said GLM-5.1 improved a vector database optimization task over more than 600 iterations and 6,000 tool calls, reaching 21,500 queries per second, about six times the best result achieved in a single 50-turn session.

In a research note, Z.ai said GLM-5.1 outperformed its predecessor, GLM-5, on several software engineering benchmarks and showed particular strength in repo generation, terminal-based problem solving, and repeated code optimization. The company said the model scored 58.4 on SWE-Bench Pro, compared with 55.1 for GLM-5, and above the scores it listed for OpenAI’s GPT-5.4, Anthropic’s Opus 4.6, and Google’s Gemini 3.1 Pro on that benchmark.

GLM-5.1 has been released under the MIT License and is available through its developer platforms, with model weights also published for local deployment, the company said. That may appeal to enterprises looking for more control over how such tools are deployed.

Longer-running coding agents

Z.ai says long-running performance is a key differentiator for the company when compared to models that lose effectiveness in extended sessions.

Analysts say this is because many current models still plateau or drift after a relatively small number of turns, limiting their usefulness on extended, multi-step software tasks.

Pareekh Jain, CEO of Pareekh Consulting, said the industry is now moving beyond tools that can answer prompts toward systems that can carry out longer assignments with less supervision.

The question, Jain said, is no longer, “What can I ask this AI?” but, “What can I assign to it for the next eight hours?”

For enterprises, that raises the prospect of assigning an agent a ticket in the morning and receiving an optimized solution by day’s end, after it has run hundreds of experiments and profiled the code.

“This capability aligns with real needs such as large refactors, migration programs, and continuous incident resolution,” said Charlie Dai, VP and principal analyst at Forrester. “It suggests that long‑running autonomous agents are becoming more practical, provided enterprises layer in governance, monitoring, and escalation mechanisms to manage risk.”

Open-source appeal grows

GLM-5.1’s release under the MIT License could be significant, especially for companies in regulated or security-sensitive sectors.

“This matters in four key ways,” Jain said. “First, cost. Pricing is much lower than for premium models, and self-hosting lets companies control expenses instead of paying per use. Second, data governance. Sensitive code and data do not have to be sent to external APIs, which is critical in sectors such as finance, healthcare, and defense. Third, customization. Companies can adapt the model to their own codebases and internal tools without restrictions.”

The fourth factor, according to Jain, is geopolitical risk. Although the model is open source, its links to Chinese infrastructure and entities could still raise compliance concerns for some US companies.

Dai said the MIT license makes it easier for companies to run the model on their own systems while adapting it to internal requirements and governance policies. “For many buyers, this makes GLM‑5.1 a viable strategic option alongside commercial models, especially where regulatory constraints, IP sensitivity, or long‑term platform control matter most,” Dai said.

Benchmark credibility

Z.ai cited three benchmarks: SWE-Bench Pro, which tests complex software engineering tasks; NL2Repo, which measures repository generation; and Terminal-Bench 2.0, which evaluates real-world terminal-based problem solving.

“These benchmarks are designed to test coding agents’ advanced coding capabilities, so topping those benchmarks reflects strong coding performance, such as reliability in planning-to-execution, less prompt rework, and faster delivery,” said Lian Jye Su, chief analyst at Omdia. “However, they are still detached from typical enterprise realities.”

Su said public benchmarks still do not capture the messiness of proprietary codebases, legacy systems, and code review workflows. He added that benchmark results come from controlled settings that differ from production, though the gap is closing as more teams adopt agentic setups.

Source link

What's Hot

AI Code Review Only Catches Half of Your Bugs – O’Reilly

Belden to acquire RUCKUS Networks for $1.85bn

Is Refusing to Adopt AI Tools at Work Damaging Your Career Growth?

Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours

Duck Creek is entering its operating model era

Guinness Enterprise Centre start-ups generated €140M revenues last year

OpenAI plans its own ‘iPhone killer’ – Computerworld

Oppo Pad mini: Hands-on Impressions

Offshore CX enters a new era of scrutiny: why enterprises must rethink delivery models

ServiceNow and Google Cloud unite AI agents for autonomous enterprise operations

Understanding U-Net Architecture in Deep Learning

Hard-braking events as indicators of road segment crash risk

Redefining AI efficiency with extreme compression

AI Code Review Only Catches Half of Your Bugs – O’Reilly

Belden to acquire RUCKUS Networks for $1.85bn

Is Refusing to Adopt AI Tools at Work Damaging Your Career Growth?

Unified observability in Amazon OpenSearch Service: metrics, traces, and AI agent debugging in a single interface

Our Picks

AI Code Review Only Catches Half of Your Bugs – O’Reilly

Belden to acquire RUCKUS Networks for $1.85bn

What's Hot

Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours

Longer-running coding agents

Open-source appeal grows

Benchmark credibility

Related Posts

Subscribe to Updates