When the pair programmer is confidently wrong
Notes from migrating the Resilium Labs website and newsletter from Squarespace to Cloudflare, with Claude in the loop
A few weeks ago, I migrated the resiliumlabs.com website off Squarespace. That part you can probably guess at: the renewal was expiring, Squarespace had gotten expensive and bloated, and moving to a static site on Cloudflare Pages meant faster loads, something fun to do, and about a tenth of the monthly cost.
What I want to write about isn’t the migration process itself, but the fact that I did most of it with Claude as a pair programmer. And that somewhere around mid-day, we took the site down in public for about an hour. And that if you’ve read Chapter 14 of my book, you already know what happened.
The plan
The goal was fairly straightforward. A static Astro site deployed to Cloudflare Pages. DNS moved from AWS Route 53 to Cloudflare (needed because Route 53 doesn’t support apex CNAMEs to non-AWS targets). Newsletter kept on Substack at newsletter.resiliumlabs.com, rebranded to match the new design. Old LinkedIn-shared URLs 301-redirected to the right Substack posts so nothing that anyone had ever shared would break.
Claude was really useful for this. He created the redirect map, the CSP with sha256 hashes for inline scripts auto-computed as a postbuild step, security headers, OG tags, and Schema.org JSON-LD. The kind of boilerplate that I would know how to write but would take me a few days to assemble correctly, and that Claude produced in about twenty minutes with me reviewing and trimming.
That part of the day felt like what the AI marketing promises: I do the architecture and the judgment, the machine handles the drudgery, we ship faster than either of us alone.
Then we got to the cutover.
The outage
Here’s what you need to know about Squarespace: their custom domain feature is powered by Cloudflare for SaaS. When you connect a domain to Squarespace, they register it as a “Custom Hostname” in their Cloudflare account. That binding tells Cloudflare’s edge: “when you see a request for this hostname, route it to Squarespace.”
I didn’t know that. Claude didn’t know that either, or “forgot” to mention it.
The cutover sequence we executed was, in order: migrate DNS to Cloudflare, point the apex and www CNAMEs at the Pages project, add the domain as a custom domain in Pages, disconnect the domain from Squarespace.
Each step looked fine in isolation. DNS propagated. Pages showed the domains as “Active.” The apex returned a 200 when I tested it.
But for the public, the site was serving Domain Not Claimed from Squarespace’s edge.
The reason, which I understand now but didn’t at the time: even though I’d pointed my DNS at Pages, Cloudflare’s edge was seeing the hostname in two places — Squarespace’s Cloudflare for SaaS Custom Hostname registration, and my new Pages Custom Hostname registration. When Cloudflare gets a request, it has to pick one. It was picking Squarespace. And since I’d just disconnected the domain from my Squarespace site, Squarespace was responding “I don’t serve this domain anymore.”
Result: 404 “Domain Not Claimed” in every browser that wasn’t on my local DNS cache.
I noticed when I got the message: “it is mid day, your site is down!!!”
What AI looked like during an outage
I want to be specific about what happened next, because this is where the AI-assisted-work pattern gets interesting.
Claude’s first instinct was to wait for Squarespace’s backend sweep to release the Custom Hostname eventhough we had no idea how long that would take. Claude set up a polling loop to detect the moment it flipped. This is exactly the kind of thing Claude is good at: small, mechanical, instrumented.
It was also exactly the wrong suggestion. Waiting and hoping for something to happen fast enough is not a plan at midday. I needed the site back now, or I needed a rollback that would work in five minutes.
When I pushed back, Claude pivoted to the rollback path — delete the Pages custom domains, restore the old Squarespace A records in DNS, reconnect the domain in Squarespace. And then admitted, when I pushed back again: “Honest: I don’t know with certainty that the rollback will work fast either. The Squarespace reconnect flow: I don’t know if it lets you immediately re-add a just-disconnected domain. I haven’t done it.”
That very moment is the important “bite”.
Because Claude was confident about the cutover sequence when we planned it, confident during execution, and confident about the initial diagnosis, the uncertainty only surfaced when I explicitly pushed on it. And the sequence Claude confidently proposed was wrong in a “this will take your production site down for an unknown amount of time” way.
In the book, I wrote: “AI that drafts the postmortem outsources the sensemaking. AI that surfaces related incidents from organizational history while the team does their own analysis supports it.”
That’s exactly what I felt in real time. When I let Claude confidently drive the plan, I was outsourcing the sensemaking. When I stopped and asked “what do you actually know and what are you guessing,” the conversation got better immediately.
In the end, I decided to open a support ticket with Squarespace asking them to release the hostname from their Cloudflare for SaaS account. They released it within minutes. DNS settled. Cache purge. Site back up. Total outage: about an hour.
What I’d do differently
Three things.
Build an inventory before you execute. The Squarespace-uses-Cloudflare-for-SaaS fact was knowable. It’s in Squarespace’s docs. Neither of us looked. Claude happily generated a plan without that context. I got lazy and happily followed it because it looked reasonable. The failure mode of AI-assisted planning isn’t that the AI is stupid like many people still say; it’s that it’s plausible-sounding enough to skip the five-minute research step that would have surfaced the constraint.
Get the old vendor to release the hostname before you cut over, not during. The Squarespace Custom Hostname was the actual blocker. I should have opened the support ticket asking them to release it while the site was still live under Squarespace — wait for confirmation, then flip DNS to Pages and add the custom domains. The overlap window I was afraid of doesn’t exist if you treat “release the binding” as a separate, prerequisite step rather than something that happens automatically when you click Disconnect.
Support tickets are a tool, not a failure. My instinct during the outage was to try to fix it myself, with Claude’s help, using tools I already had. What actually worked was opening a Squarespace support ticket and asking them to do the one thing on their side that nobody else could do. Five minutes of writing the ticket, a few minutes for them to act. I wasted longer than that trying to figure out how to force Cloudflare’s edge to re-route on my own. The human on the other side of the vendor had the specific privileges I needed. The AI didn’t, and couldn’t know to suggest it.
That last one is worth sitting with. When you pair with an AI, you risk inheriting its frame of reference. AI tools operate in the shape of developer-tool problem-solving: commands, APIs, config edits. The longer I stayed in that loop, the more my own thinking stayed in that loop too — and “open a support ticket with the vendor” falls outside it. To be clear, that’s nobody’s fault but mine. It’s Work-as-Imagined versus Work-as-Done at a smaller scale: my imagined toolkit, shaped by the pairing session, didn’t include “ask a human at the other vendor.” Took me twenty minutes of trying to force the problem through code to remember that not every infrastructure problem has a code solution.
The irony
Few months ago, I wrote the final chapter of Why We Still Suck at Resilience. That chapter argues that AI risks outsourcing exactly the kind of learning that makes organizations resilient. That the danger isn’t that AI is bad, it’s that it’s good enough to let people stop engaging with how their systems actually work. That the productive struggle is where the understanding lives.
Then I used Claude to migrate my website, and at the first real incident, I caught myself doing the thing I’d just warned against. Taking the plan at face value. Not pushing on the uncertainty. Watching the outage tick by while the AI proposed an elegant polling loop.
I didn’t fire Claude… yet. It was actually really useful for most of the work. The fix was for me to stop treating Claude’s confidence as a signal about reality and start treating it as what it is: a stylistic default. The real information is what Claude doesn’t know, and you only get at that by asking what it actually knows and being an active participant in that conversation.
Resilience engineers call this the left-over principle. You automate the parts that are mechanical, and humans are left with the parts that aren’t — judgment, ambiguity, the phone call to the vendor, the gut sense that a plan smells wrong. The trap is that humans do those parts worse after prolonged exposure to automation, because the automation has taken the routine practice that built the competence. Bainbridge described this in her 1983 paper “Ironies of Automation.” AI didn’t change the lesson. It just raised the stakes of forgetting it, by expanding what automation looks like it can handle.
//Adrian


