Two Thirds of Your C-Suite Cannot Confirm AI Is Working. That’s Not a Confidence Problem.

Apr 13, 2026

In the first quarter of 2026, Quaie asked 187 senior decision-makers across ten C-suite functions a direct question: can you confirm that AI is creating durable economic value in your organisation? Sixty-seven per cent could not. They were not sceptics. They were not laggards. They were executives running live AI programmes, allocating real budgets, and reporting to boards that had approved the investment. They simply had no instrument that told them whether it was working. That is not a confidence problem. It is a measurement problem, and the distinction determines everything that follows: which intervention an organisation commissions, which question its board asks at the next budget review, and whether the capital already committed compounds into compounding advantage or accumulates quietly into a position that nobody can evaluate.

The instrument gap nobody is naming

Every large organisation investing in AI has built some version of a reporting infrastructure around it. Dashboards tracking deployment velocity. Steering committees reviewing programme milestones. Quarterly updates to the board on AI initiatives, headcount committed, and budget allocated. These instruments are not useless. They tell the organisation what is happening. They do not tell the organisation whether what is happening is working.

The distinction is structural. Activity metrics and value metrics are not the same class of measurement, and conflating them produces a reporting environment that looks comprehensive and is not. An organisation can have thirty AI initiatives in production, two hundred people working on them, and fifty million pounds committed, and still have no instrument that connects any of those inputs to a confirmed economic output. The reporting architecture measures the investment. It does not measure the return.

This is not unique to AI and it is not a new problem. In June 2024, Goldman Sachs published a research note titled “Gen AI: Too Much Spend, Too Little Benefit?” Jim Covello, the bank’s head of global equity research, put the central concern precisely: AI applications must solve extremely complex and important problems for enterprises to earn an appropriate return on investment, and the technology is not currently designed to do that at the cost levels being incurred.¹ In the same month, David Cahn at Sequoia Capital published what he called “AI’s $600B Question”: a calculation showing that the revenue gap between AI infrastructure investment and confirmed end-user value had grown from $200 billion to $600 billion in the space of nine months.² Neither Covello nor Cahn was arguing that AI has no value. Both were making a measurement point: investment was scaling at a rate that confirmed value was not matching, and the reporting infrastructure available to boards and investors was not designed to detect the gap.

What was true at the market level in June 2024 is true at the organisational level in April 2026. The gap between AI activity and confirmed AI value is not a question of whether the technology works. It is a question of whether the instrument exists to confirm that it does.

What confirmed value actually requires

The phrase “durable economic value” in Quaie’s survey instrument was chosen precisely because it raises the evidentiary standard above what most AI reporting currently reaches. Durable means repeatable and persistent: not a one-quarter efficiency gain that reverted when the champion left, not a cost saving that materialised in one function and was absorbed by increased spend in another, not a productivity improvement that showed up in individual output and disappeared at team level. Economic means measurable in the currency that boards and CFOs use: revenue, margin, cost, risk. Value means the output exceeds the input in a way that a sceptical finance director could verify with access to the numbers.

Most AI reporting does not reach this standard, not because the value is absent, but because the measurement system was not designed to capture it at this level of specificity. Pilot results are measured against pilot metrics. Programme progress is measured against programme milestones. Neither is designed to answer the question a board should be asking: is this investment producing a confirmed, repeatable economic return, and do we know which parts of the organisation are generating it and which are not?

The Klarna case is the confirmed value story that confirmed value stories should be measured against. In early 2024, Klarna announced that its AI customer service assistant, built in partnership with OpenAI, was doing the work equivalent to 700 full-time agents, handling two thirds of all customer service interactions in its first month of deployment. The CEO, Sebastian Siemiatkowski, was explicit: the system was resolving cases nine minutes faster than human agents, customer satisfaction matched human representative scores, and the programme was projected to deliver $40 million in additional profit in 2024.³ This was, by every measure available to the organisation at the time, confirmed AI value.

By May 2025, Siemiatkowski told Bloomberg that “cost unfortunately seems to have been a too predominant evaluation factor when organising this” and that the company had seen lower quality as a result. Klarna began rehiring human customer service agents.⁴ The programme had not failed in a conventional sense. The AI assistant had functioned as described. What had failed was the measurement framework used to confirm value in the first place: customer satisfaction scores and resolution time are activity metrics. They measure whether the interaction completed. They do not measure whether the interaction produced the outcome the customer required, whether quality held across interaction types, or whether the cost saving was sustained across the full range of service demands. The value that was confirmed was real but incomplete. The incompleteness only became visible after the commitment was irreversible.

This is the measurement failure the 67.4% finding is indexing. Not scepticism. Not resistance. The absence of an instrument capable of confirming value at the level of specificity that sustained commitment requires.

What the licence-to-value gap reveals

By the end of 2024, Microsoft had sold Copilot licences to 70 per cent of the Fortune 500, a figure Satya Nadella cited on the company’s FY25 Q1 investor call.⁵ The headline number describes purchase. It does not describe confirmed value. Lighthouse, a technology consulting firm that examined Copilot deployment patterns across enterprise clients, found that for most organisations, adoption meant pilots and phased rollouts rather than enterprise-wide deployment. The gap between licence acquisition and confirmed production value was not a technology problem. Copilot functioned as described. The gap was a measurement problem: organisations had no consistent instrument for confirming whether the tool was generating durable economic value at the role level before, during, or after deployment.

This matters because the licence decision and the value confirmation question operate on different timescales and involve different functions. The decision to purchase Copilot licences was typically made at the CTO or CEO level, on the basis of vendor capability assessment, competitive pressure, and board-level AI ambition. The question of whether those licences were generating confirmed economic value fell to a different set of functions: the CFO assessing return on the licence spend, the CHRO measuring workforce productivity impact, the CMO confirming whether AI-assisted content and campaign work was delivering commercial outcomes. Those functions were not involved in the licence decision. They were left to measure value using instruments that were not designed for the purpose.

The 70 per cent figure describes an AI commitment made at one level of the organisation. The measurement question it left unanswered was distributed across every other level.

The board question that is not being asked

Boards approving AI budgets are, in most cases, approving them on the basis of three inputs: a technology assessment confirming the tool is capable, a market comparison confirming that competitors are investing, and a management recommendation confirming that the organisation is ready. None of these inputs answers the question that should precede the approval: do we have an instrument capable of confirming whether this investment is working at the role level, and if not, are we building one?

The technology assessment tells the board what the tool can do. The market comparison tells the board what peers are spending. The management recommendation tells the board what the leadership team collectively says it believes. What none of them provides is a mechanism for confirming value after the investment has been made, disaggregated to the functions responsible for generating it.

Consider what a board conversation about AI investment typically contains. A slide showing the number of active AI initiatives. A slide comparing the organisation’s AI maturity against a sector benchmark. A management summary describing progress against the AI strategy approved twelve months earlier. What it rarely contains is a slide showing which specific leadership functions have confirmed lasting value from AI deployment, which have not, and what the gap between those two positions implies for the capital allocation being requested. That slide does not exist in most board packs because the measurement instrument required to produce it does not exist in most organisations.

This is not a governance failure in the conventional sense. Boards are not asking the wrong question because they are incurious or incompetent. They are asking the questions they have the instruments to answer. The instruments available to most boards were not designed to confirm role-level economic value from AI investment. They were designed to track programme activity and report it upward. The 67.4% finding is, among other things, a measure of how many organisations have reached significant AI investment without yet building the measurement layer that would tell them whether the investment is justified.

Why the gap compounds

The measurement problem does not stay static. It compounds, in two directions simultaneously.

In the first direction: the clock runs. Every quarter that passes without a confirmed value measurement is a quarter in which capital continues to be allocated against an unconfirmed return. The CFO who cannot confirm value today will be asked to approve a larger budget next quarter. The approval decision will be made on the same insufficient evidence base, because the measurement infrastructure has not changed. The investment grows. The confirmation gap grows with it.

Covello’s concern at Goldman Sachs was precisely this: that organisations were committing capital at a rate that the available evidence base could not support, and that the absence of a confirmed value measurement was not slowing the commitment. “Sustained corporate profitability will allow sustained experimentation with negative ROI projects,” he wrote in the June 2024 note.⁶ The organisations with the balance sheets to sustain the experimentation will do so regardless of whether the measurement infrastructure exists to confirm the return. The ones without that balance sheet flexibility will discover the gap at the point of reckoning, when the capital is already deployed and the confirmation is still missing.

In the second direction: the absence of confirmed value creates the conditions for a different kind of problem. Roles that are not generating confirmed value begin to diverge from roles that are. A CTO who has confirmed value in one domain and a CMO who has not will interpret the same organisational AI programme differently, because they are experiencing different things. That divergence, left unmeasured, hardens into the misalignment that stalls programmes at the point they appear to be succeeding. The measurement gap and the alignment gap are not separate problems. The first produces the second, and the second is significantly more expensive to resolve than the first.

The Klarna trajectory illustrates the sequence at compressed speed. Activity metrics confirmed value. Investment scaled. A different measurement standard, applied later, revealed that the confirmed value was incomplete. The correction required rehiring, reputational management, and a public recalibration by the CEO. The cost of that recalibration was materially higher than the cost of building a more complete measurement framework before the commitment scaled.

What the 67.4% is indexing

Two thirds of senior leaders in the Q1 2026 dataset cannot confirm that AI is creating durable economic value. At the market level, David Cahn at Sequoia had identified a $600 billion gap between AI infrastructure investment and confirmed end-user revenue as of June 2024. At the organisational level, the 67.4% finding identifies the same gap at the leadership system level: investment is scaling, confirmation is not keeping pace, and the measurement instrument required to close that gap does not yet exist in most organisations.

That is not a reason to slow down. It is not a reason to question the technology. It is a specific and actionable diagnosis: the organisations that will extract lasting value from AI over the next decade are not necessarily the ones investing most heavily right now. They are the ones building the measurement infrastructure to confirm where value is forming and where it is not, before the capital commitment becomes too large to redirect.

Organisations that treat 67.4% as a confidence problem will commission better communication, clearer leadership messaging, and more convincing evidence from early pilots. They will not build the instrument, because they have diagnosed the wrong problem. Jim Covello asked, in June 2024, what trillion-dollar problem AI would solve. Most organisations do not yet have an instrument that confirms whether the problem is being solved at all. Building it, before the next capital allocation cycle closes, is the most rational investment a leadership team can make in its own ability to know what it is doing.

This essay is part of Quaie’s Ongoing Research Series, examining how organisations decide to adopt AI, role by role, over time.

Notes and sources

¹ Goldman Sachs, “Gen AI: Too Much Spend, Too Little Benefit?” June 2024. Jim Covello, head of global equity research at Goldman Sachs, quoted: “My main concern is that the substantial cost to develop and run AI technology means that AI applications must solve extremely complex and important problems for enterprises to earn an appropriate return on investment.” The report estimated that tech giants and beyond were set to spend approximately $1 trillion on AI capital expenditure in the coming years. Source: Goldman Sachs Top of Mind series, June 2024. goldmansachs.com/insights/top-of-mind/gen-ai-too-much-spend-too-little-benefit.

² David Cahn, “AI’s $600B Question,” Sequoia Capital, June 20, 2024. Cahn calculated the revenue gap between AI infrastructure investment and confirmed end-user value by multiplying Nvidia’s run-rate revenue forecast by 2x to reflect total data centre costs, then by 2x again to reflect a 50% gross margin for end-users. The original analysis had placed the gap at $200 billion in September 2023. Source: sequoiacap.com/article/ais-600b-question.

³ Klarna AI customer service deployment: Klarna press release, February 2024. The company announced that its AI assistant, built in partnership with OpenAI, was handling two thirds of all customer service chats in its first month of operation, equivalent to the work of 700 full-time agents. Resolution time was reported as nine minutes faster than human agents. Customer satisfaction scores were described as matching those of human representatives. The programme was projected to deliver $40 million in additional profit in 2024.

⁴ Klarna recalibration: Sebastian Siemiatkowski, interview with Bloomberg, May 2025, quoted in CNBC and Fortune. Siemiatkowski acknowledged that cost had been too predominant an evaluation factor and that the AI-first customer service transition had resulted in lower quality. Klarna subsequently announced plans to rehire human customer service agents. Source: CNBC, “Klarna CEO says AI helped company shrink workforce by 40%,” May 14, 2025; Fortune, October 2025.

⁵ Microsoft Copilot Fortune 500 adoption: Satya Nadella, Microsoft FY25 Q1 earnings call. Microsoft stated that 70% of Fortune 500 companies had adopted Microsoft 365 Copilot. Lighthouse technology consulting analysis noted that for most organisations, adoption meant pilots and phased rollouts rather than enterprise-wide deployment. Source: Microsoft FY25 Q1 investor call; Lighthouse, “What Microsoft 365 Copilot Adoption Really Looks Like,” 2025. lighthouseglobal.com.

⁶ Covello, Goldman Sachs, June 2024: “Sustained corporate profitability will allow sustained experimentation with negative ROI projects.” Source: as note 1.

⁷ Quaie Role Layer Executive Survey, Q1 2026 (n=187). 67.4% of respondents could not confirm that AI is creating durable economic value. Fieldwork conducted January to March 2026 across ten C-suite functions: CEO, CTO/CIO, COO, CFO, CMO, CRO/CSO, CDO, CISO, CHRO, CLO. Full methodology: quaie.io/p/methodology.