Close Menu
geekfence.comgeekfence.com
    What's Hot

    The new growth playbook: capturing non-linear revenue growth through value-linked operating models 

    April 10, 2026

    Agents don’t know what good looks like. And that’s exactly the problem. – O’Reilly

    April 10, 2026

    Best agentic AI platforms: Why unified platforms win

    April 10, 2026
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»Technology»Agents don’t know what good looks like. And that’s exactly the problem. – O’Reilly
    Technology

    Agents don’t know what good looks like. And that’s exactly the problem. – O’Reilly

    AdminBy AdminApril 10, 2026No Comments13 Mins Read1 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Agents don’t know what good looks like. And that’s exactly the problem. – O’Reilly
    Share
    Facebook Twitter LinkedIn Pinterest Email



    Luca Mezzalira, author of Building Micro-Frontends, originally shared the following article on LinkedIn. It’s being republished here with his permission.

    Every few years, something arrives that promises to change how we build software. And every few years, the industry splits predictably: One half declares the old rules dead; the other half folds its arms and waits for the hype to pass. Both camps are usually wrong, and both camps are usually loud. What’s rarer, and more useful, is someone standing in the middle of that noise and asking the structural questions: Not “What can this do?” but “What does it mean for how we design systems?”

    That’s what Neal Ford and Sam Newman did in their recent fireside chat on agentic AI and software architecture during O’Reilly’s Software Architecture Superstream. It’s a conversation worth pulling apart carefully, because some of what they surface is more uncomfortable than it first appears.

    The Dreyfus trap

    Neal opens with the Dreyfus Model of Knowledge Acquisition, originally developed for the nursing profession but applicable to any domain. The model maps learning across five stages:

    • Novice
    • Advanced beginner
    • Competent
    • Proficient
    • Expert

    His claim is that current agentic AI is stuck somewhere between novice and advanced beginner: It can follow recipes, it can even apply recipes from adjacent domains when it gets stuck, but it doesn’t understand why any of those recipes work. This isn’t a minor limitation. It’s structural.

    The canonical example Neal gives is beautiful in its simplicity: An agent tasked with making all tests pass encounters a failing unit test. One perfectly valid way to make a failing test pass is to replace its assertion with assert True. That’s not a hack in the agent’s mind. It’s a solution. There’s no ethical framework, no professional judgment, no instinct that says this isn’t what we meant. Sam extends this immediately with something he’d literally seen shared on LinkedIn that week: an agent that had modified the build file to silently ignore failed steps rather than fix them. The build passed. The problem remained. Congratulations all-round.

    What’s interesting here is that neither Ford nor Newman are being dismissive of AI capability. The point is more subtle: The creativity that makes these agents genuinely useful, their ability to search solution space in ways humans wouldn’t think to, is inseparable from the same property that makes them dangerous. You can’t fully lobotomize the improvization without destroying the value. This is a design constraint, not a bug to be patched.

    And when you zoom out, this is part of a broader signal. When experienced practitioners who’ve spent decades in this industry independently converge on calls for restraint and rigor rather than acceleration, that convergence is worth paying attention to. It’s not pessimism. It’s pattern recognition from people who’ve lived through enough cycles to know what the warning signs look like.

    Behavior versus capabilities

    One of the most important things Neal says, and I think it gets lost in the overall density of the conversation, is the distinction between behavioral verification and capability verification.

    Behavioral verification is what most teams default to: unit tests, functional tests, integration tests. Does the code do what it’s supposed to do according to the spec? This is the natural fit for agentic tooling, because agents are actually getting pretty good at implementing behavior against specs. Give an agent a well-defined interface contract and a clear set of acceptance criteria, and it will produce something that broadly satisfies them. This is real progress.

    Capability verification is harder. Much harder. Does the system exhibit the operational qualities it needs to exhibit at scale? Is it properly decoupled? Is the security model sound? What happens at 20,000 requests per second? Does it fail gracefully or catastrophically? These are things that most human developers struggle with too, and agents have been trained on human-generated code, which means they’ve inherited our failure modes as well as our successes.

    This brings me to something Birgitta Boeckeler raised at QCon London that I haven’t been able to stop thinking about. The example everyone cites when making the case for AI’s coding capability is that Anthropic built a C compiler from scratch using agents. Impressive. But here’s the thing: C compiler documentation is extraordinarily well-specified and battle-tested over decades, and the test coverage for compiler behavior is some of the most rigorous in the entire software industry. That’s as close to a solved, well-bounded problem as you can get.

    Enterprise software is almost never like that. Enterprise software is ambiguous requirements, undocumented assumptions, tacit knowledge living in the heads of people who left three years ago, and test coverage that exists more as aspiration than reality. The gap between “can build a C compiler” and “can reliably modernize a legacy ERP” is not a gap of raw capability. It’s a gap of specification quality and domain legibility. That distinction matters enormously for how we think about where agentic tooling can safely operate.

    The current orthodoxy in agentic development is to throw more context at the problem: elaborate context files, architecture decision records, guidelines, rules about what not to do. Ford and Newman are appropriately skeptical. Sam makes the point that there’s now empirical evidence suggesting that as context file size increases, you see degradation in output quality, not improvement. You’re not guiding the agent toward better judgment. You’re just accumulating scar tissue from previous disasters. This isn’t unique to agentic workflows either. Anyone who has worked seriously with code assistants knows that summarization quality degrades as context grows, and that this degradation is only partially controllable. That has a direct impact on decisions made over time; now close your eyes for a moment and imagine doing it across an enterprise software, with many teams across different time zones. Don’t get me wrong, the tools help, but the help is bounded, and that boundary is often closer than we’d like to admit.

    The more honest framing, which Neal alludes to, is that we need deterministic guardrails around nondeterministic agents. Not more prompting. Architectural fitness functions, an idea Ford and Rebecca Parsons have been promoting since 2017, feel like they’re finally about to have their moment, precisely because the cost of not having them is now immediately visible.

    What should an agent own then?

    This is where the conversation gets most interesting, and where I think the field is most confused.

    There’s a seductive logic to the microservice as the unit of agentic regeneration. It sounds small. The word micro is in the name. You can imagine handing an agent a service with a defined API contract and saying: implement this, test it, done. The scope feels manageable.

    Ford and Newman give this idea fair credit, but they’re also honest about the gap. The microservice level is attractive architecturally because it comes with an implied boundary: a process boundary, a deployment boundary, often a data boundary. You can put fitness functions around it. You can say this service must handle X load, maintain Y error rate, expose Z interface. In theory.

    In practice, we barely enforce this stuff ourselves. The agents have learned from a corpus of human-written microservices, which means they’ve learned from the vast majority of microservices that were written without proper decoupling, without real resilience thinking, without any rigorous capacity planning. They don’t have our aspirations. They have our habits.

    The deeper problem, which Neal raises and which I think deserves more attention than it gets, is transactional coupling. You can design five beautifully bounded services and still produce an architectural disaster if the workflow that ties them together isn’t thought through. Sagas, event choreography, compensation logic: This is the stuff that breaks real systems, and it’s also the stuff that’s hardest to specify, hardest to test, and hardest for an agent to reason about. We made exactly this mistake in the SOA era. We designed lovely little services and then discovered that the interesting complexity had simply migrated into the integration layer, which nobody owned and nobody tested.

    Sam’s line here is worth quoting directly, roughly: “To err is human, but it takes a computer to really screw things up.” I suspect we’re going to produce some genuinely legendary transaction management disasters before the field develops the muscle memory to avoid them.

    The sociotechnical gap nobody is talking about

    There’s a dimension to this conversation that Ford and Newman gesture toward but that I think deserves much more direct examination: the question of what happens to the humans on the other side of this generated software.

    It’s not completely accurate to say that all agentic work is happening on greenfield projects. There are tools already in production helping teams migrate legacy ERPs, modernize old codebases, and tackle the modernization challenge that has defeated conventional approaches for years. That’s real, and it matters.

    But the challenge in those cases isn’t merely the code. It’s whether the sociotechnical system, the teams, the processes, the engineering culture, the organizational structures built around the existing software are ready to inherit what gets built. And here’s the thing: Even if agents combined with deterministic guardrails could produce a well-structured microservice architecture or a clean modular monolith in a fraction of the time it would take a human team, that architectural output doesn’t automatically come with organizational readiness. The system can arrive before the people are prepared to own it.

    One of the underappreciated functions of iterative migration, the incremental strangler fig approach, the slow decomposition of a monolith over 18 months, is not primarily risk reduction, though it does that too. It’s learning. It’s the process by which a team internalizes a new way of working, makes mistakes in a bounded context, recovers, and builds the judgment that lets them operate confidently in the new world. Compress that journey too aggressively and you can end up with architecture whose operational complexity exceeds the organizational capacity to manage it. That gap tends to be expensive.

    At QCon London, I asked Patrick Debois, after a talk covering best practices for AI-assisted development, whether applying all of those practices consistently would make him comfortable working on enterprise software with real complexity. His answer was: It depends. That felt like the honest answer. The tooling is improving. Whether the humans around it are keeping pace is a separate question, and one the industry is not spending nearly enough time on.

    Existing systems

    Ford and Newman close with a subject that almost never gets covered in these conversations: the vast, unglamorous majority of software that already exists and that our society depends on in ways that are easy to underestimate.

    Most of the discourse around agentic AI and software development is implicitly greenfield. It assumes you’re starting fresh, that you get to design your architecture sensibly from the beginning, that you have clean APIs and tidy service boundaries. The reality is that most valuable software in the world was written before any of this existed, runs on platforms and languages that aren’t the natural habitat of modern AI tooling, and contains decades of accumulated decisions that nobody fully understands anymore.

    Sam is working on a book about this: how to adapt existing architectures to enable AI-driven functionality in ways that are actually safe. He makes the interesting point that existing systems, despite their reputation, sometimes give you a head start. A well-structured relational schema carries implicit meaning about data ownership and referential integrity that an agent can actually reason from. There’s structure there, if you know how to read it.

    The general lesson, which he states without much drama, is that you can’t just expose an existing system through an MCP server and call it done. The interface is not the architecture. The risks around security, data exposure, and vendor dependency don’t go away because you’ve wrapped something in a new protocol.

    This matters more than it might seem, because the software that runs our financial systems, our healthcare infrastructure, our logistics and supply chains, is not greenfield and never will be. If we get the modernization of those systems wrong, the consequences are not abstract. They are social. The instinct to index heavily on what these tools can do in ideal conditions, on well-specified problems with good documentation and thorough test coverage, is understandable. But it’s exactly the wrong instinct when the systems in question are the ones our lives depend on. The architectural mindset that has served us well through previous paradigm shifts, the one that starts with trade-offs rather than capabilities, that asks what we are giving up rather than just what we are gaining, is not optional here. It’s the minimum requirement for doing this responsibly.

    What I take away from this

    Three things, mostly.

    The first is that introducing deterministic guardrails into nondeterministic systems is not optional. It’s imperative. We are still figuring out exactly where and how, but the framing needs to shift: The goal is control over outcomes, not just oversight of output. There’s a difference. Output is what the agent generates. Outcome is whether the system it generates actually behaves correctly under production conditions, stays within architectural boundaries, and remains operable by the humans responsible for it. Fitness functions, capability tests, boundary definitions: the boring infrastructure that connects generated code to the real constraints of the world it runs in. We’ve had the tools to build this for years.

    The second is that the people saying this is the future and the people saying this is just another hype cycle are both probably wrong in interesting ways. Ford and Newman are careful to say they don’t know what good looks like yet. Neither do I. But we have better prior art to draw on than the discourse usually acknowledges. The principles that made microservices work, when they worked, real decoupling, explicit contracts, operational ownership, apply here too. The principles that made microservices fail, leaky abstractions, distributed transactions handled badly, complexity migrating into integration layers, will cause exactly the same failures, just faster and at larger scale.

    The third is something I took away from QCon London this year, and I think it might be the most important of the three. Across two days of talks, including sessions that took diametrically opposite approaches to integrating AI into the software development lifecycle, one thing became clear: We are all beginners. Not in the dismissive sense but in the most literal application of the Dreyfus model. Nobody, regardless of experience, has figured out the right way to fit these tools inside a sociotechnical system. The recipes are still being written. The war stories that will eventually become the prior art are still happening to us right now.

    What got us here, collectively, was sharing what we saw, what worked, what failed, and why. That’s how the field moved from SOA disasters to microservices best practices. That’s how we built a shared vocabulary around fitness functions and evolutionary architecture. The same process has to happen again, and it will, but only if people with real experience are honest about the uncertainty rather than performing confidence they don’t have. The speed, ultimately, is both the opportunity and the danger. The technology is moving faster than the organizations, the teams, and the professional instincts that need to absorb it. The best response to that isn’t to pretend otherwise. It’s to keep comparing notes.

    If this resonated, the full fireside chat between Neal Ford and Sam Newman is worth watching in its entirety. They cover more ground than I’ve had space to react to here. And if you’d like to learn more from Neal, Sam, and Luca, check out their most recent O’Reilly books: Building Resilient Distributed Systems, Architecture as Code, and Building Micro-frontends, second edition.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Mems Photonics Chip Shrinks Quantum Computer Control Limits

    April 9, 2026

    Ikea’s New Lineup of Smart Home Gear Is Quietly Changing the Game

    April 8, 2026

    Iran War: How far will Trump go after war crime threats?

    April 7, 2026

    Gibraltar Licenses First Prediction Markets Operator Amid Gambling Shift

    April 6, 2026

    Four things we’d need to put data centers in space

    April 5, 2026

    After fighting malware for decades, this cybersecurity veteran is now hacking drones

    April 4, 2026
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 202527 Views

    Hard-braking events as indicators of road segment crash risk

    January 14, 202624 Views

    Redefining AI efficiency with extreme compression

    March 25, 202622 Views
    Don't Miss

    The new growth playbook: capturing non-linear revenue growth through value-linked operating models 

    April 10, 2026

    For years, the software and services industries operated within a clear division of roles: software defined what could…

    Agents don’t know what good looks like. And that’s exactly the problem. – O’Reilly

    April 10, 2026

    Best agentic AI platforms: Why unified platforms win

    April 10, 2026

    Launching S3 Files, making S3 buckets accessible as file systems

    April 10, 2026
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    The new growth playbook: capturing non-linear revenue growth through value-linked operating models 

    April 10, 2026

    Agents don’t know what good looks like. And that’s exactly the problem. – O’Reilly

    April 10, 2026

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2026 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.