From chatbots to autonomous agents: what your board must understand before handing processes over to AI

In September 2025, Gartner published its first Hype Cycle dedicated exclusively to agentic AI. Its verdict, translated into board language, was blunt: the most aggressive adoption curve among emerging technologies, with fewer than 17% of organisations actually running deployed agents and more than 40% of agentic projects on track to fail before 2027. Two months later, McKinsey published its State of AI 2025, based on a survey of 1,993 organisations across 105 countries. Only 6% of them earn the AI high performer label, with EBIT impact above 5%. 23% say they are scaling some kind of agent, almost always confined to one or two functions.

Two independent readings, same year, same conclusion: the wave is real, and most of what is landing on the board agenda as an autonomous agent isn't quite that yet.

What changes when you go from chatbot to agent

A chatbot answers; an agent acts. That single line captures the shift and hides the real consequence.

The chatbots mid-market companies have rolled out between 2023 and 2025 take an input (a customer question, a marketing brief), generate an output (a piece of text, an answer) and hand control back to a human. The human signs off, sends, and lives with the win or the mistake. The AI plays assistant; the executive or the employee keeps the last word at every step.

An agent, as both Gartner and McKinsey define it in their recent reports, is a system built on a foundation model capable of planning steps, executing them against real systems (CRM, ERP, email, files, external APIs), evaluating intermediate results and retrying. Every new problem hides inside that definition. Planning means deciding. Executing means acting with consequences. Evaluating means judging its own work. Each of those verbs used to belong to an employee or a manager.

The operational consequence: when a mid-market company moves from chatbot to agent, it is shifting organisational model. The assistant is replaced by a function with an implicit P&L, and nobody has yet signed a change to the org chart.

What the primary data actually says

Serious sources cool the conversation fast.

Gartner, Hype Cycle for Agentic AI (September 2025). Only 17% of organisations have agentic AI deployed. More than 60% plan to deploy it within the next two years. Over 40% of agentic AI projects will fail before 2027 due to governance, security and cost issues, according to Gartner's own forecast. The firm places the technology at the Peak of Inflated Expectations on the Hype Cycle, a position worth reading twice before signing a multi-year deal.

McKinsey, State of AI 2025 (November 2025, 1,993 organisations across 105 countries). 88% use AI in at least one business function. But only 23% say they are scaling an agentic system somewhere in the business, and the vast majority of those scaling do so in one or two functions, not across the organisation. Only 6% of companies qualify as AI high performers, with EBIT impact above 5%. The other 94% pile up board hours and vendor spend with no reported return.

The figure most worth committing to memory, because it frames the question the board will have to answer to shareholders or the chair, is McKinsey's: 39% of companies using AI report some EBIT impact, and only 6% report impact above 5%. Between that 6% and everyone else lies, quite literally, the difference between having decided well and having paid for expensive smoke.

"The adoption curve is the most aggressive among recent emerging technologies, and at the same time fully autonomous deployments are not ready for the majority of enterprise environments. What lives in production are agents scoped to specific tasks, not general-purpose agents."

Four levels of "agent" being sold to you

When a vendor walks into your boardroom with an "agentic" solution, the first useful question is which level they are actually playing at. Without that map, everything sounds the same and prices align to the top of the range.

1. Mature chatbot. A conversational model wired to your CRM or knowledge base. It answers, drafts, summarises. It doesn't act, doesn't decide. This is what most mid-market companies deployed between 2023 and 2025. Useful and cheap, no longer a competitive edge. If a vendor sells it to you as an "agent" in 2026, there is hype-inflated mark-up in the price.

2. Copilot with step-by-step approval. The system proposes a specific action (send an email, create an invoice, update a field) and a human approves each step. This is where the vast majority of "agents" actually running in European mid-market companies live today. The human is still the bottleneck, but no longer writing, just validating. Measurable return, contained risk, the sensible place to start.

3. Scoped agent with limited autonomy. The system acts inside a perimeter defined by hard rules (spending cap, allowed domains, allowed operation types, working hours). It decides and executes without asking for permission at every step, but it is supervised by exception. It works in tightly delimited cases: ticket triage, simple accounting reconciliations, first drafts of routine commercial replies. This is what Gartner calls task-specific agents and forecasts will be integrated into 40% of enterprise applications by the end of 2026, up from less than 5% today.

4. Fully autonomous, general-purpose agent. Takes abstract objectives ("grow share in segment X") and decides what steps to take, which external providers to talk to, and what budget to commit. It doesn't exist yet at commercial maturity, and the few public pilots live inside large corporations with dedicated oversight teams. Any vendor selling it to a mid-market company today is, at best, describing a roadmap intention.

This map serves two purposes. First, it lets you translate a vendor pitch into the actual level it belongs to. Second, it lets you demand the price of that level rather than the price of the trade-show buzz.

The five questions before delegating a process to an agent

When the board is about to approve handing a process over to an autonomous system, before discussing vendor or model, five things need answers. There is no shortcut around these five; you end up paying for them later if you don't dispatch them first.

1. Is the action we are about to delegate reversible? Booking a room, drafting an email, opening a ticket: reversible. Confirming a payment, signing an order, communicating a price to a client, processing a leaver in HR: irreversible. An agent's first perimeter inside a mid-market company should be built only on reversible actions. Autonomy over irreversible actions comes after months of clean history and exception controls that genuinely work, not during the pilot.

2. What authority and budget are we granting, in concrete numbers? If the agent can spend, there must be a ceiling in euros per operation, per day and per month, not a guideline in an internal memo. If the agent can negotiate terms, there must be explicit ranges it cannot cross. Without those hard limits, the board is approving an unlimited corporate card to an entity whose decision-making, as of today, nobody around the table fully understands.

3. How are we going to audit what it did and why? Traceability means the ability to answer to a client, a vendor, an auditor or a judge why the system did what it did and on what information. Before delegating, the board has to see, in a real demo, how a specific agent decision is reconstructed. If the vendor's answer is "we have JSON logs", the board's answer is no.

4. What happens if the agent loops or stalls? Agents can, and do, repeat costly calls against paid APIs (models, data, messaging) until somebody notices. Without a clear kill-switch and without anomaly-spend alerts configured before launch, a surprise invoice at month-end is the polite version of the problem.

5. Who is on the hook if the agent makes a contractual error with a client? The agent vendor almost always caps its own liability at the price of the service in its terms. Civil liability toward the end client, the sector regulator and the board still sits with the mid-market company. That boundary has to be written down and accepted by the board before the pilot, not discovered after the first incident.

Five questions aren't a methodology; they are a filter. A proposal that doesn't answer them goes back to study. A proposal that does deserves, at the very least, the cost of evaluating it seriously.

Three new risks that didn't exist with chatbots

Moving from chatbot to agent opens three categories of risk that simply weren't on the map with passive assistants.

Silent data exfiltration. An agent that combines several internal systems may end up sending information to external APIs in patterns no classic DLP policy flags. The leak stops being a clumsy email and becomes a flow that is legitimate from the firewall's point of view and catastrophic from the client's. The control moves out of the firewall and into the agent's design, its effective permissions and its traceability. That is board work with legal advice, before implementation.

Unaudited irreversible action. A misused chatbot produces text a human could have corrected before sending. When a poorly scoped agent makes a mistake, it acts on its own: it cancels the wrong client's subscription, refunds a payment twice, or pushes a notification to the entire list. The damage is operational and reputational at once, and responsibility still rests with the executive who signed the delegation, not the model.

Open-ended cost. Inference costs, external-API usage and long-context storage can scale non-linearly when an agent retries or stacks several models. Without contractual caps with the vendor and without internal alerts, a three-month pilot can end up costing as much as a yearly service. The CFO has to be at the table from day one, not show up when the invoice lands.

None of these three categories fits on the IT agenda alone. All three belong to the board.

Where to start if you are a mid-market company

The easiest trap right now is jumping from chatbot to a level-(3) scoped agent in six months, without ever having spent serious time on the level-(2) copilot. The gap in return and risk between the two leaps is large.

Quarter one. A serious inventory. Which cases in your company today are reversible, repetitive and carry a clear business metric. Invoice reconciliation, first drafts of typical commercial replies, meeting minutes, initial ticket triage. Those are your copilot-with-step-approval candidates. Nothing else.

Quarter two. Pick two cases from the inventory and deploy a copilot with three hard guardrails: cap on operations per day, mandatory human validation per step, traceability the board can review in five minutes. Measure time saved, error rate and inference cost against a real baseline, not against a vendor promise.

Quarter three. Only if both cases clear the bar (genuine saving, contained error rate, predictable cost) do you evaluate moving one of them up to a level-(3) scoped agent. With a kill-switch, a spend ceiling and a documented incident-response plan. The other stays as a copilot or closes without drama.

Quarter four. Full read-out. Which processes gain from being delegated, which don't, and which internal decisions (org chart, departmental KPIs, profile of new hires) need adjusting next year. This is where you decide whether the company moves seriously up the curve or stays on copilot for another year, which is a perfectly valid answer if the data doesn't support otherwise.

Twelve months, an honest read, no bet whose failure jeopardises a business line. That cadence matches what the Gartner and McKinsey numbers actually support; the trade-show buzz asks for faster, the data asks for this.

The conversation Shift Directivo opens

The vendors arriving at boardrooms this year are trained to sell level (4) at level (4) prices. If the board has no map, it pays that price for something closer to level (2). If it has the map, it negotiates real scope and saves at least a year of unreturned spend.

That reading, how a leadership team trains itself to tell copilot from agent, scoped agent from fully autonomous agent, and commercial promise from proven capability, is what we work on in our executive-mindset reset service and, specifically, in Shift Directivo, Stradiax's in-person programme for executives who want to walk out with, in ninety days, their own judgement on where and how to introduce agents in their company without signing the wrong contract.

Next time an "agentic" solution lands on your board, before asking which model it uses, run the proposal through the four-level taxonomy and the five questions. If the vendor falls on any of them, look for another vendor or postpone the decision until the Gartner and McKinsey numbers shift in a different direction from the current one.

From chatbots to autonomous agents: what your board must understand before handing processes over to AI

What changes when you go from chatbot to agent

What the primary data actually says

Four levels of "agent" being sold to you

The five questions before delegating a process to an agent

Three new risks that didn't exist with chatbots

Where to start if you are a mid-market company

The conversation Shift Directivo opens

Stradiax Editorial Board

One email. Once a week. Strategic only.

Related

AI is no longer an IT topic: what central banks are telling your board

Your AI money is going to the wrong layer

AI without an IT department: your 90-day ROI roadmap