Back to blog

Building Avocado Studio: an AI editor that doesn't break your website

The enterprise DXPs are racing to ship agentic editing. I'm building an open, self-hostable alternative — here's what I've learned so far.

  • AI
  • Architecture
  • Editor
  • Product

The problem I kept seeing

Every major content platform — Adobe AEM, Sitecore, Contentstack, Optimizely — is racing to add AI-native editing. They call it “agentic content operations.” The direction is right: let people describe what they want in plain language and have the system figure out how to make it happen.

But all of these new features ship behind the same procurement model as the platforms themselves. Six-figure annual licenses, multi-month implementations, dedicated teams to operate it all. If you’re not a Fortune-1000 buyer, you don’t get access.

Meanwhile, the teams running modern composable stacks — Next.js, Sanity, Contentful, Vercel — have better architecture than most legacy DXPs, but no AI editing layer to put on top. And the teams still on WordPress or Squarespace deserve a real upgrade path that isn’t “buy an enterprise platform.”

I wanted to build the thing that fills that gap.

What Avocado Studio actually is

Avocado Studio is an open, self-hostable content operations platform. You describe changes in natural language — “add a testimonials section below the hero” or “change the CTA to Book a demo” — and the system generates structured, schema-validated operations, shows you the plan, and applies approved changes with live preview.

It runs as three services: an orchestrator (Fastify API that handles AI planning, validation, undo/redo, publishing), a content studio (React UI for chat, plan review, settings), and your site (Next.js, rendered in an iframe with draft mode).

The key design decision: Avocado doesn’t replace your CMS, your DAM, or your deploy target. It’s the AI editing layer that sits on top of whatever stack you already have.

The architecture that makes it reliable

Early versions let the AI generate free-form output and tried to apply it directly. Demos looked impressive. Real usage was a mess — inconsistent edits, broken layouts, no way to undo.

The fix was moving from raw generation to structured operations. Every edit is one of seven operation types (add_block, update_block, remove_block, move_block, add_page, duplicate_page, delete_page), and every operation is validated against Zod schemas before it touches content.

This sounds obvious in retrospect, but it changed everything:

Intent detection comes first. A fast model (Claude Haiku) classifies each message before the full planner runs. Simple text changes get handled immediately — no need to spin up the heavy planner for “change the heading to Welcome.” Complex requests go to a planning model with full page context and block schemas.

Validation catches bad output early. Every operation is checked against the block’s schema. If the AI tries to set a heading to a number or reference a block that doesn’t exist, it’s rejected automatically. The orchestrator attempts auto-repair by re-prompting with the specific error. If that fails too, the user gets a clear message — not a broken page.

Streaming makes it feel fast. Operations are validated and applied as they stream from the LLM, not after the full response. Each applied operation triggers a preview update, so users see changes appearing at ~800ms intervals instead of waiting for a loading spinner. Image lookups (Unsplash, DALL-E, Gemini) resolve in the background while text and structural changes apply immediately.

Undo is per-operation, not per-plan. If an AI plan contains three operations and the third one is wrong, you undo just that one. The history stack records every operation individually.

What I’m still learning

The AI is the easy part. Most of the engineering work is in the deterministic systems around the model — schema validation, operation sequencing, draft state management, publishing pipelines. The model just needs to output structured JSON that matches the block schemas. The hard part is making sure everything before and after the model call is rock-solid.

Multi-model support matters more than I expected. The orchestrator supports Anthropic, OpenAI, and Google Gemini. Different users have different API keys, cost constraints, and model preferences. Claude is the most battle-tested for planning, but some users want GPT-4o for simple edits or Gemini Flash for speed. Making the model pluggable was more work upfront but has paid for itself.

“Bring your own stack” is a feature and a constraint. Avocado integrates with whatever CMS, PIM, DAM, and deploy target you already use. That’s the right product decision — it’s why people adopt it instead of ripping out their stack. But it means every integration surface is a potential failure point. The Site SDK abstracts most of it, but edge cases in Contentful vs. Sanity vs. Strapi vs. plain JSON files keep the testing matrix wide.

Where it’s going

The project is in active development. The Next.js integration path is solid. There’s a visual drag-and-drop editor alongside the chat editor. The Site Assistant can migrate existing websites, integrate GitHub repos, or scaffold new sites from a description.

It’s open and self-hostable — no per-seat licenses, no annual contracts. The total cost is your cloud hosting plus LLM tokens.

If you’re running a modern composable stack and want AI-native editing without buying a DXP — that’s the gap I’m building for. More to come as the project matures.