Close Menu
geekfence.comgeekfence.com
    What's Hot

    Indonesia’s Indosat Ooredoo Hutchison and Huawei Win TM Forum 2026 Excellence in AI & Data for business impact Award

    June 29, 2026

    Posit AI Blog: Audio classification with torch

    June 29, 2026

    AI Writes the Code. Humans Still Carry the Risk |

    June 29, 2026
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»Big Data»AI Writes the Code. Humans Still Carry the Risk |
    Big Data

    AI Writes the Code. Humans Still Carry the Risk |

    AdminBy AdminJune 29, 2026No Comments9 Mins Read1 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    AI Writes the Code. Humans Still Carry the Risk |
    Share
    Facebook Twitter LinkedIn Pinterest Email


    AI Made the First Draft Cheap: Correctness Is Still Expensive

    On June 16, Databricks introduced an AI agent that builds forecasting models, deploys apps, and writes its own documentation from a sentence of English, joining comparable agents already running at Snowflake, AWS, and GitHub. The open question isn’t whether an agent can write the code. It’s whether anyone can trust what it wrote.

    AI Made the First Draft Cheap. Correctness Is Still Expensive

    Freelance data scientist Longhow Lam described a similar moment on LinkedIn. He said plain-English instructions could direct an AI agent through data generation, forecasting, deployment, and documentation, yet every artifact still needed careful review before he trusted it.

    A gap separates work generated from work confirmed correct, and it defines the past year of agentic data tools. Vendors measure how much an agent can produce. Few measure how much of the resulting production survives contact with a reviewer who has to sign off on it.

    Call the missing number verified output: the share of generated code, models, or dashboards a qualified human approves without rework. It is the metric most productivity claims skip, and it is the one data leaders need most.

    English Is Becoming an Interface to the Data Stack

    Programming has moved up a layer before. Programmers wrote in machine code until 1957, when IBM’s John Backus led the team that built Fortran, the first widely used high-level language. Low-code platforms followed decades later: Forrester says it coined the term in 2014, and Microsoft launched PowerApps in November 2015 to let business users build applications through visual tools instead of code.

    Agentic AI extends the pattern, but the mechanism differs. A compiler applies fixed rules to source code and produces a predictable result every time. A large language model interprets an ambiguous instruction and produces a probable result, not a guaranteed one. English works as an interface to a code-producing system rather than as a replacement for the code, tests, and schemas underneath it.

    Four examples show how far the interface has moved. Snowflake’s Cortex Agents reached general availability on November 4, 2025, planning tasks and pulling from structured and unstructured data through Cortex Analyst and Cortex Search. AWS introduced AgentCore Code Interpreter in August 2025, letting agents write and run Python, JavaScript, and TypeScript for data analysis inside a sandboxed environment. GitHub’s Copilot coding agent became generally available on September 25, 2025, accepting a delegated task, opening a draft pull request, and asking a human to review it. Databricks’ Genie Code, now folded into the broader Genie One suite, plans and executes data science workflows from a written prompt.

    Each vendor frames its agent around a plain-language request. None removes the step where a person decides if the output is fit to ship.

    Generation and Verification Do Not Scale Together

    Benchmarks built specifically for data work show why plausible answers carry real risk. DSBench, presented at ICLR 2025, tested AI agents against 466 data-analysis questions and 74 end-to-end modeling tasks drawn from real competitions. The strongest agent in the original evaluation solved roughly a third of the analysis questions, well below sampled human performance, though the benchmark relied on 2024-era models and newer systems may score higher.

    Google Research published a counterpoint in November 2025. Its DS-STAR system raised accuracy on three data-science benchmarks, reaching 45.2% on DABStep, 44.7% on KramaBench, and 38.5% on DA-Code, ahead of the best alternative tested at the time. The hardest DABStep tasks still needed an average of 5.6 rounds of planning and verification before the system settled on an answer. Even a research system built to push past prior limits treats review as part of the work, not as cleanup performed afterward.

    A 2024 study from Microsoft Research and the University of Washington, presented at CHI, watched 22 analysts work through AI-generated analyses. Participants leaned on procedure-level evidence, such as code and explanations, and on data-level evidence, such as tables and charts, to decide whether a result held up. Their checks sorted into five layers: did the code run, was the method appropriate, were joins and missing values handled correctly, did the result answer the real business question, and would the pipeline keep working on new data.

    Generation scales with compute. Verification scales with the number of qualified people available to look closely at an answer and decide if it can be trusted. The two rates rarely match, and the distance between them is where work piles up.

    The Productivity Evidence Depends on What Gets Counted

    Some of the strongest AI-productivity evidence comes from a 2023 controlled experiment, still widely cited, in which developers asked to build a JavaScript HTTP server finished 55.8% faster with GitHub Copilot than without it. The task was narrow, the goal was clear, and success was easy to judge. Under narrow, well-scoped conditions, an agent helped enormously.

    METR’s 2025 randomized trial points the other way. Sixteen experienced open-source developers worked through 246 tasks in large, mature repositories they already knew well. With AI access, completion took 19% longer. Participants had predicted a 24% speedup beforehand, and they still estimated a 20% speedup afterward, despite the slower outcome they had just lived through. METR frames the result as a snapshot of early-2025 tools in one setting, not a universal verdict on AI coding.

    Google’s 2025 DORA report surveyed software professionals and found AI use among 90% of them, with a median of two hours a day. Adoption tracked with higher output, and it tracked with lower delivery stability at the same time. DORA’s framing fits the pattern: AI amplifies what a team already does well, and amplifies what it does poorly just as fast.

    Stack Overflow’s 2025 developer survey adds a behavioral signal. Forty-six percent of respondents distrusted AI output accuracy, against 33% who trusted it, and only 3% reported high trust. Sixty-six percent said they spent more time fixing AI code which looked almost right but proved wrong. dbt Labs found 80% of data practitioners used AI daily in late 2024, up from 30% a year earlier, yet only 30% trusted an agent to answer natural-language questions directly against their data. Acceleration and confidence are not the same measurement, and the surveys keep finding gaps between them.

    The New Bottleneck Changes the Shape of the Data Team

    If English lowers the cost of asking a question, then the cost shifts toward judging the answer. Anaconda’s 2025 survey of practitioners found reported skill gaps concentrated in AI governance (30%), deep-learning engineering (23%), and prompt design (20%), a spread suggesting a wider mix of skills rather than one skill replacing the rest. LinkedIn data shows a 177% jump in members adding AI-related skills to their profiles since 2023, nearly five times the growth rate across all skills, though the figure tracks self-reported skills, not employer requirements written into job postings.

    Job-posting research covering 378 US public companies recruiting for generative-AI roles found higher demand for cognitive skills and a post-ChatGPT rise in social-skill requirements, though the dataset runs through 2023 and isn’t specific to data-science roles. Read together, the evidence supports a narrower claim than the one frequently repeated in headlines: domain framing, evaluation, governance, and orchestration are gaining value alongside coding ability, not replacing it. No dataset reviewed here shows employers dropping Python or statistics requirements in favor of prompt-writing skills.

    Inside a data team, the shift lands unevenly. A junior analyst can now produce a working draft model in an afternoon. A senior reviewer, a domain expert, or a data-quality owner still has to decide whether the draft deserves to influence a customer, an operational decision, or a dollar of spend. Junior staff create faster. Senior staff carry more decisions per day, because the volume in front of them grew while their headcount stayed flat. Accountability concentrates around the people positioned to catch a wrong assumption before it reaches production, regardless of who wrote the first version.

    Opinion: Measure Verified Outcomes, Not Generated Volume

    Here is the take: counting generated artifacts as a productivity measure rewards the wrong behavior. A dashboard, model, or pull request an agent produces in seconds carries no value until a qualified person confirms it works and decides to keep it. A simple count of outputs tells a team how busy its agents stayed, not how much real progress it made.

    Data leaders should track verified outcomes instead. Acceptance rate measures the share of agent-generated work approved without rework. Review time measures how many human-hours each accepted artifact cost. Escaped-defect rate measures how often a problem reaches production anyway. Rework volume, model-monitoring incidents, and time to a validated decision round out a picture closer to reality than a count of lines written or queries answered. The clearest single number may be the simplest: the share of generated work reaching production unchanged.

    Nothing above argues against agentic tools. Cortex Agents, AgentCore, and Copilot’s coding agent all lower the cost of a first draft, and a cheaper first draft is worth having. My take: the win gets overstated whenever a vendor or a headline conflates speed of generation with speed of delivery.

    Natural language will keep widening who can start a piece of data work. A marketing analyst, a finance lead, or an operations manager can now ask a question in plain words and get back a model, a chart, or a working app. What stays scarce is knowing which question to ask, how much evidence is enough before trusting an answer, and when to refuse one. The skill won’t show up in a model’s response time, and it won’t get cheaper just because the first draft did.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    How to Protect Your Data in 2026

    June 28, 2026

    How Precisely and DoorDash Are Rethinking Delivery of AI-Ready Commercial Property Imagery

    June 27, 2026

    Implement multi-tenant search with Amazon OpenSearch Serverless next generation

    June 25, 2026

    Databricks positioned highest in execution and furthest in vision for the second consecutive year in Gartner Magic Quadrant

    June 24, 2026

    New Data Analytics Breakthroughs Give Ecommerce Startups a Fighting Chance

    June 23, 2026

    Google Spent $2.7 Billion to Keep Noam Shazeer, OpenAI Got Him Anyway |

    June 22, 2026
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 202557 Views

    Hard-braking events as indicators of road segment crash risk

    January 14, 202630 Views

    Redefining AI efficiency with extreme compression

    March 25, 202628 Views
    Don't Miss

    Indonesia’s Indosat Ooredoo Hutchison and Huawei Win TM Forum 2026 Excellence in AI & Data for business impact Award

    June 29, 2026

    Press Release [Copenhagen, Denmark, June 24, 2026]‌ At the DTW 2026, Indosat Ooredoo Hutchison (IOH),…

    Posit AI Blog: Audio classification with torch

    June 29, 2026

    AI Writes the Code. Humans Still Carry the Risk |

    June 29, 2026

    From the Water to the World: the secret behind a flawless regatta livestream

    June 29, 2026
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    Indonesia’s Indosat Ooredoo Hutchison and Huawei Win TM Forum 2026 Excellence in AI & Data for business impact Award

    June 29, 2026

    Posit AI Blog: Audio classification with torch

    June 29, 2026

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2026 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.