Most Agentic Systems Are the Wrong Architecture

This is the third of three articles on what mid-market leaders need to understand about AI in its current form. The first examined what AI has become as a category. The second examined the strategic consequences of its collapsing cost. This article examines what determines whether the systems built on this new capability deliver value or fail in production.

Most organizations approaching agentic AI treat it as an engineering problem. They identify a use case, hand it to a development team or a vendor, and wait for results. The results often disappoint. Costs run higher than projected. The system handles the demo well and the production traffic poorly. Edge cases multiply faster than fixes. The instinct is to assume the technology is not ready. The more accurate reading is that the architecture was chosen before anyone understood the problem.

Agentic AI is not primarily an engineering problem. It is an architecture problem with engineering downstream. The decision that determines whether the investment delivers value is made before code is written, often without the people making it realizing they are making a decision at all. This article describes what that decision is, why it matters, and the questions executives should be asking before any agent project moves forward.

What an Agentic System Actually Is

The term agent has become loose in industry usage. A useful working definition: an agentic system is one in which an AI model is given a goal, allowed to choose its own steps, and permitted to take actions in the world through tools such as APIs, databases, or applications. The model is not following a script. It is deciding what to do next based on what it observes.

This is fundamentally different from the AI systems most organizations have deployed to date. A chatbot answers questions. A summarization tool produces output and stops. A document analyzer extracts fields and hands them off. These are useful, but they are not agents. They do one thing and conclude.

An agent operates over many steps. It calls a tool, observes the result, decides whether to call another tool or to deliver an answer, and continues until it determines the work is done. That additional autonomy is the source of both the capability and the risk. The same loop that allows an agent to handle messy, open-ended problems is the loop that, if misdesigned, runs up unexpected costs, takes irreversible actions, or fails in ways that are difficult to audit.

Anthropic, in published guidance from its applied research team, draws a sharper distinction worth knowing: between workflows and agents. Workflows are systems where AI components are arranged in predefined sequences. Agents are systems where the AI directs its own execution. Most production use cases that organizations describe as agents are actually workflows. The distinction matters because workflows are cheaper, more predictable, easier to monitor, and easier to govern. The reflex to build an agent when a workflow would do is one of the most expensive mistakes organizations are making in this category right now.

Why Pattern Selection Precedes Engineering

Every agentic system rests on a small set of architectural patterns. The patterns have names that engineers use, but the choice between them is a business decision before it is a technical one. Each pattern carries different implications for cost per request, response time, reliability, and the kinds of failures the system will produce when it gets things wrong.

A team that selects a multi-agent architecture for a problem a single well-prompted system could handle commits the organization to higher costs and more failure points than the problem requires. A team that selects a rigid workflow for a problem that requires adaptive reasoning will discover the limitation only in production, when the system meets edge cases the original design did not anticipate. Both mistakes are common. Both are expensive to correct after the fact.

The cost of getting pattern selection wrong is not the engineering rework, although that is real. It is the operational drag of running a system whose architecture does not match its workload. Higher per-transaction cost. Slower response times. More incidents. Harder audits. Harder to extend when the use case evolves. Most of this never surfaces as a single visible failure. It surfaces as a system that quietly underperforms expectations until someone is asked to justify the investment.

The decision to be deliberate about pattern selection is the executive contribution to the project. The engineering team can implement any of the patterns. The question of which pattern fits the problem requires the business context that engineers may not have, the cost discipline that engineering culture often understates, and the risk awareness that lives at the leadership level.

Five Questions That Determine Outcome

A practical framework for pattern selection comes down to five questions about the work itself. Asking them before the project starts saves more than asking them later.

The first question is whether the steps are known in advance. If the work decomposes into a predictable sequence such as extract data, validate, store, notify, a workflow is the right architecture. Cheaper, faster, more predictable. The instinct to layer reasoning on top of a process that does not require reasoning is the most common form of over-engineering in this category.

The second question is whether the system needs to interact with external systems. Almost every useful agentic system does. The relevant follow-up is which external systems and with what permissions. The scope of what an agent can touch determines both its value and its risk profile.

The third question is whether the structure of the work can be planned before execution begins. If a project naturally decomposes into stages with clear dependencies, a planning pattern works well. If the work only reveals its structure through execution, a more exploratory pattern fits better. The cost of choosing wrong is wasted compute on plans that do not survive contact with reality.

The fourth question is whether quality matters more than speed. Systems can be configured to generate output, critique that output, and refine it iteratively. This produces better results at higher cost and longer response times. The configuration is appropriate for high-value outputs where a poor result is expensive. It is inappropriate for live interactions where users will not wait.

The fifth question is whether the work genuinely requires multiple specialized agents working together. The honest answer is usually no. Multi-agent systems are appealing because they sound sophisticated. In practice, they are difficult to coordinate, expensive to operate, and harder to debug than single-agent systems. The trigger for multi-agent architecture should be a specific bottleneck that specialization actually solves, not architectural ambition.

Three Dimensions Most Teams Skip

The five questions above describe the work itself. Three additional questions, often skipped, describe the operating environment. These determine whether the architecture chosen for the work can survive in production.

The first is the cost and latency budget. Every agentic pattern carries a different cost profile per request. Reflection loops triple token spend. Planning adds upfront latency to every request whether the plan helps or not. Multi-agent coordination multiplies both. A pattern that is correct for the task but wrong for the budget is still the wrong pattern. The question to ask before a project begins is what the per-request cost ceiling is and what response time is acceptable. The answer shapes which patterns are viable.

The second is failure blast radius. The right question is not whether the system will fail, because all systems do. The question is what happens when it does and who absorbs the consequences. An agent that drafts emails for human review has a small blast radius. An agent that sends those emails autonomously has a larger one. An agent that can modify a production system, dispatch funds, or change customer records has a blast radius that should drive every other architectural decision. High blast radius pushes toward more conservative patterns, explicit approval gates, and human checkpoints at every state-changing action. These are not features added later. They are constraints that shape the architecture from the start.

The third is observability and audit. In regulated industries, security-sensitive contexts, or any environment where decisions must be defensible after the fact, the ability to reconstruct what the system did and why is not a feature. It is a constraint. Agentic systems generate substantially more audit surface than traditional software because every reasoning step and every tool call must be logged, indexed, and replayable. The systems that handle this well have the audit infrastructure designed in from the start. The systems that do not are usually unable to defend themselves when called upon to do so.

What This Looks Like in Practice

For an executive evaluating an agentic AI project, whether internal or vendor-delivered, a small set of questions reveals whether pattern selection has been done deliberately.

Ask the engineering team or vendor to describe, in plain language, why they chose the pattern they chose. The answer should reference the structure of the work, the cost budget, and the failure consequences. If the answer is that the pattern is what they normally use, or what is recommended in a popular framework, the selection has not been made deliberately.

Ask what a single request will cost at production scale. If the answer is uncertain, the cost model has not been built and the system will surprise the organization in operation.

Ask what happens when the system gets it wrong. The answer should describe specific recovery paths, human checkpoints, and the worst plausible outcome. If the answer is generic, the failure modes have not been thought through.

Ask how the team will know whether the system is performing well after deployment. The answer should reference specific metrics, a test set, and a process for re-evaluating the design when results degrade. If the answer is that the model will improve over time, the evaluation discipline is not in place.

These four questions can be asked in any review meeting. The quality of the answers indicates whether the project is ready to proceed or whether the architectural decisions need more work before engineering accelerates.

The Thread

Agentic AI is not a single technology. It is a set of architectural choices about how AI systems are organized, what they are allowed to do, and how they are monitored when they do it. The choices are not interchangeable. Each one fits a particular kind of problem and produces a particular cost, risk, and reliability profile.

The organizations that will get value from agentic AI over the next several years are not the ones that adopt fastest. They are the ones that match architecture to problem deliberately, model the costs honestly, and build governance into the design rather than bolting it on after the fact. The engineering work is the easier half of this. The decision work is where the value is created or lost.

The question for executives is not whether to invest in agentic AI. The question is whether the architecture decisions behind those investments are being made by people who understand what they are choosing.