The story behind my AI site editor

It started as personal R&D

For a while I’d been watching two things in parallel.

The first was the new wave of agentic coding tools — Claude Code and Codex in particular. Not chat-with-an-LLM. Actual agents that read your repo, run your tests, edit multiple files, and iterate. I was curious how far they’d actually get on something non-trivial — not a to-do app, not a one-shot demo, but a real product with a real architecture, a real test suite, and real edge cases.

The second was an idea that wouldn’t go away. Every time I watched a founder try to update their website — change a heading, swap a hero image, add a testimonials section — the same thing happened. The idea takes ten seconds. The execution takes days. Wait for a developer, open a ticket, review a PR, deploy. Or fight with a page builder that technically lets you do it yourself but makes every change feel like defusing a bomb.

I kept wondering: what if you could just describe the change you want, and the website would update itself?

So I made it a personal R&D project. Two questions to answer in parallel: how good are coding agents at sustained work, and can this AI editor idea actually be made reliable enough to trust on a real site? One project, two experiments.

The first thing I tried

I started with the obvious prototype — a chat input wired to an LLM that could modify a page. You’d type “make the heading bigger” and it would rewrite the HTML. Demo gold. I showed it to a few people and they loved it.

Then I tried using it for real. Within a day it was obvious the approach was broken. The AI would sometimes change things I didn’t ask for. Styles would drift. There was no undo. You couldn’t predict what would happen next. The “wow” wore off fast, and what was left wasn’t trustworthy enough to use on a real site.

The question that changed the direction

I stepped back and asked: what do the big enterprise platforms actually do? Adobe AEM, Sitecore, Contentstack, Optimizely — they’re all racing to ship exactly this kind of AI-native editing. They call it “agentic content operations.” And the direction is clearly right.

But every one of these solutions ships behind the same model: six-figure annual licenses, multi-month implementations, dedicated teams. If you’re not a Fortune-1000 buyer, you don’t get in.

Meanwhile, there’s a massive group of teams running modern composable stacks — Next.js, Sanity, Vercel, Contentful — who have better architecture than most legacy DXPs but no AI editing layer to put on top. And below them, teams on WordPress or Squarespace who deserve a real upgrade path that isn’t “buy an enterprise platform.”

That gap is what I decided to build for.

What “reliable AI editing” actually meant

The breakthrough wasn’t a better model or a smarter prompt. It was realising that the AI should never touch the website directly.

Instead of letting the model rewrite HTML, I designed a system where every edit is a structured operation — add_block, update_block, remove_block, move_block. Each operation is validated against a schema before it’s applied. If the AI generates something invalid, it gets rejected automatically. The user sees every proposed change before it goes live, and can undo any individual operation.

This made the editor feel less magical in demos but dramatically more trustworthy in practice. And trust is the thing that actually matters when someone is editing a real website.

What I learned about the agents along the way

The other half of the experiment — how good are coding agents at sustained work — turned out to be just as interesting as the product itself.

Claude Code and Codex are excellent at the well-scoped middle of a problem: refactor this file, add this endpoint, wire this adapter. Give them a clear architectural seam and they’ll do the work faster and cleaner than I would. They’re also a lot better at boring-but-necessary work than I am — writing tests, updating docs, keeping naming consistent across a hundred files.

The part they don’t do well is the part I expected: deciding what to build. The big architectural calls — “operations as the universal interchange format,” “every edit goes through a planner that emits validated ops,” “the site never knows which editor produced the change” — those still had to come from me. The agents executed them brilliantly once the shape was clear. They couldn’t conjure the shape.

So the answer to the first question turned out to be: agents are great force-multipliers on architecture you already understand, and dangerous on architecture you don’t. The interesting work isn’t writing the code anymore — it’s having a clear enough picture of the system that an agent can be trusted to execute against it.

Where the idea sits now

The R&D project has grown into something I’m calling Avocado Studio — an open, self-hostable content operations platform. Chat-driven AI editing, a visual drag-and-drop editor (with Puck integrated as the canvas), multi-model AI support, and a “bring your own stack” architecture that integrates with whatever CMS, DAM, and deploy target you already use.

It’s still evolving. The Next.js integration path is solid. Other frameworks are early. I’m building it in the open and learning as I go.

Source: github.com/avocadostudio-ai/avocado
Product site: avocadostudio.dev

The core belief hasn’t changed from that first prototype: people should be able to describe what they want and see it happen. I just had to learn that making it reliable matters more than making it impressive — and that the same lesson applies to the agents I used to build it.