When AI is a single point of failure you can’t audit

And the eval frameworks won't save you

Apr 09, 2026

black box on white table — Photo by Tommy Diner on Unsplash

Much of resilience work centres on understanding failure. Sometimes that means asking what happens when this thing fails, so you can prepare. More often it means asking what happened after something already went wrong, so you can learn from it. Either way, the discipline assumes you can reason about the failure, that you can look at a component and say with some confidence what it does, how it broke, and what broke with it.

AI undoes that assumption in ways most organisations adopting it haven’t reckoned with yet. When a traditional component fails, the failure is legible because the system leaves evidence that humans can interpret, evidence whose logic was written by humans in the first place. When AI fails, that legibility suddenly vanishes. The model produced an output, the output was wrong, and the path from input to output runs through billions of parameters that nobody, including the people who trained the model, can fully explain. You can observe that it failed. You frequently cannot explain why, which means you cannot anymore say with confidence what conditions will trigger the same failure again.

There is research trying to change this. Mechanistic interpretability, most visibly pursued by Anthropic, aims to reverse-engineer the internal computation of large language models by tracing circuits from input to output. MIT Technology Review named it a breakthrough technology for 2026, and Anthropic has open-sourced tooling that lets researchers trace paths through a model’s features. The ambition is real, and the progress is there. But the honest state of the field is sobering: core concepts like “feature” still lack rigorous definitions, many interpretability queries are computationally intractable, and it currently takes hours of human effort to understand the circuits behind prompts of only tens of words. Even Anthropic’s CEO has framed this as a race between interpretability and model intelligence, one where interpretability is behind. And there’s a practical constraint that should feel familiar to anyone who has built observability into complex systems: someone has to decide which circuits to examine, which prompts to trace, which behaviours to investigate. Those decisions are shaped by what researchers already understand or expect, in the same way that monitoring coverage reflects what engineers anticipated being important. The parts of the model that nobody thought to look at are where the surprises live, and in a system with billions of parameters, that’s most of it.

This changes what containment means. Traditional blast radius engineering works because you can draw boundaries around components and reason about what crosses them. With AI, the boundaries blur because the model’s behaviour depends on context in ways no specification captures. The same model, given slightly different phrasing, might produce a completely different decision, and the difference between a correct decision and a catastrophic one can hinge on nuances that nobody documented. You can’t draw a blast radius around something whose failure modes shift with every input.

Organisations deal with this by treating AI the way they used to treat senior engineers: trusting it because it’s usually right. That works fine during normal operations, which is exactly why it’s dangerous. The trust builds invisibly over months of good performance, and by the time the AI makes a confidently wrong call, the organisational reflexes that would have caught a human making the same mistake have atrophied. Nobody double-checks the AI’s work anymore because it’s been right a thousand times, and the thousand-and-first time is when it matters.

The fallback problem compounds this. In well-designed systems, when a component fails you (hopefully) fall back to something simpler and more predictable. The fallback for AI-driven judgment requires the very skills that AI’s success has been quietly eroding: engineers who can still triage incidents manually, who can assess risk without a model score, and who remember how to build a judgement from first principles.

Researchers at Aalto University have documented this dynamic in detail. A study of an accounting firm found that reliance on automation fostered complacency and progressively eroded staff awareness, competence, and the ability to assess outputs. When the automated system was removed, the firm discovered that employees could no longer perform core accounting tasks. A 2026 Springer chapter on AI competency erosion identifies four pathways through which this happens: individual skill atrophy, structural erosion of expertise development systems, systemic organisational vulnerability, and what the authors call “false expertise transitions,” where apparent competence masks underlying knowledge gaps. Separately, human factors researchers have introduced the concept of a “Cognitive Infrastructure Threat,” arguing that the problem goes beyond losing the ability to execute tasks manually. What erodes is the reconstructive reasoning capacity needed to regain control during anomalies. A vigilant human who lacks that reconstructive capacity may detect that something is wrong but still be unable to intervene effectively. Aviation has already confronted this pattern. After a rise in near misses linked to declining manual flying skills, the US Federal Aviation Administration recommended that pilots periodically hand-fly for the majority of flights, essentially mandating the maintenance of fallback capability. Organisations routing critical decisions through AI face the same dynamic but have no equivalent mandate, and most (will) discover the gap only at the moment the fallback is needed, which is the worst possible time to learn that a capability has atrophied.

What makes AI a single point of failure rather than just another component is the breadth of decisions that organisations route through it. A failed database affects one service. A failed load balancer affects one cluster. But a failed AI that’s been integrated into incident response, capacity planning, pricing decisions, customer risk assessment, and operational monitoring affects all of them simultaneously. The failure is correlated because the same model, with the same blind spots, makes the same category of error across every domain it touches. Traditional single points of failure are well understood and heavily mitigated precisely because we learned the hard way what happens when they go down. AI as a single point of failure is new enough that most organisations haven’t even mapped it as one.

The audit problem ties all of this together. You can audit a rule-based system by reading the rules, and you can audit a human decision-maker by asking them to explain their reasoning, but you cannot meaningfully audit an AI system in the same way. The growing industry of AI evaluation frameworks is an attempt to close this gap, and those frameworks matter, but they share a fundamental limitation: every eval is Work-as-Imagined. Someone had to think of a test, which means the eval covers the scenarios its designer anticipated. The failure mode you care about is the one your test suite didn’t cover, the one that arose from a combination of context and input that nobody imagined, handled by the AI with the same confidence it brings to everything else. More sophisticated evals don’t escape this constraint, they just push it further out. The gap between the eval’s specification of how the system should behave and how it actually behaves in production under conditions nobody modelled is the Work-as-Imagine vs Work-as-Done gap applied to AI, and it’s where the failures live.

The EU AI Act’s transparency provisions take effect in August 2026, and regulators are beginning to require that organisations explain why their AI made a particular decision. The intent is sound, but the tooling isn’t ready. A CHI 2025 study surveyed AI audit tools and found the landscape fragmented, with transparency infrastructure still nascent and most tools focused narrowly on accuracy metrics, explainability, or fairness in isolation rather than addressing auditability as a coherent discipline. Even where tools exist, using them demands skills that most engineering teams don’t have. Mechanistic interpretability, circuit tracing, feature decomposition: these are specialised research techniques, not standard engineering practices. Debugging an LLM is nothing like debugging a microservice, and the people who know how to do it are concentrated in a handful of research labs. The broader engineering community is playing catch-up with a discipline that barely existed five years ago, and the gap between what regulators are about to require and what practitioners can actually deliver in production is significant.

This leaves organisations with a genuinely uncomfortable choice. Treating AI as a critical dependency with opaque failure modes is the right starting point: assume it will fail, assume the failure will be surprising, assume you won’t understand it immediately, and build the containment and fallback capability to operate without it. But that last part is where most strategies quietly fall apart, because maintaining the human expertise to operate without AI means investing continuously in exactly the capability that AI was supposed to replace. It means paying for engineers who can work without model scores, keeping engineers who can make judgments calls by hand, training people in matters that the AI handles 99% of the time. That’s an ongoing cost with no visible return during normal operations, which makes it the first thing to get cut when budgets tighten, and the thing you most desperately need when the AI gets it wrong in a way nobody anticipated.

//Adrian

—

References used in the post

MIT Technology Review, “Mechanistic interpretability: 10 Breakthrough Technologies 2026” (January 2026) — Link

Anthropic interpretability research and circuit tracing — Link

Dario Amodei, “The Urgency of Interpretability” — Link

“Mechanistic interpretability: 2026 status report” (GitHub gist summarising field state, limitations, and open problems) — Link

Aalto University / TechRadar, “Researchers warn that skill erosion caused by AI could have a devastating and lasting impact on businesses” (September 2025), referencing “The Vicious Circles of Skill Erosion” (2023) — Link

Yadav, P.S. (2026), “AI Competency Erosion: Understanding Expertise Decay” in The AI Competency Paradox, Springer — Link

“Position: Human-Centric AI Requires a Minimum Viable Level of Human Understanding” (arxiv, January 2026) — the Cognitive Infrastructure Threat concept — Link

FAA recommendation on manual flying skills — referenced in “Does using artificial intelligence assistance accelerate skill decay?” (PMC) — Link

Ojewale et al., “Towards AI Accountability Infrastructure: Gaps and Opportunities in AI Audit Tooling,” CHI 2025 — Link

When AI is a single point of failure you can’t audit

And the eval frameworks won't save you

Ready for more?