Esy Research

Engineering deep dives and AI tool breakdowns weekly

Building Multi-Agent Workflows with Claude Code

Zev Uhuru
Zev UhuruEngineer, Esy
February 20, 202612:00
claude-codemulti-agentarchitectureLLM orchestration

How I orchestrate 34 specialized agents to produce research artifacts at Esy — the architecture decisions, failure modes, and why single-LLM approaches break down at scale.

Building a production workflow engine on top of LLMs sounds straightforward until you try it. A single prompt chain works for demos. It falls apart the moment you need reliable, cited, structured output across dozens of use cases. This is the story of how Esy's multi-agent architecture evolved from a single Claude API call to a 34-agent orchestration system — and what broke along the way.

The Single-Agent Trap

Most AI coding tools start here: one model, one prompt, one output. It works beautifully for small tasks. But the moment you need an agent to research and outline and draft and cite — the context window becomes the bottleneck. Not because it runs out of tokens, but because the model loses focus. A 4,000-word essay prompt that includes research instructions, style guidelines, and citation rules produces mediocre output across every dimension.

Why Multi-Agent

The insight is simple: specialization works. An agent dedicated to citation verification doesn't need to know anything about narrative structure. An agent that designs infographic layouts doesn't need to parse DOIs. By decomposing the workflow into discrete stages — Intake, Research, Outline, Draft, Cite & Format — each agent can be small, focused, and testable.

The Architecture

Each workflow template at Esy maps to a pipeline of agents. The pipeline definition lives in a configuration file — not in code. This means adding a new workflow type (say, a grant proposal) doesn't require engineering work. It requires defining which agents participate and in what order.

What Broke

Three things consistently broke during development:

  1. Agent handoff serialization — passing structured data between agents without losing context or introducing hallucinations in the intermediate state
  2. Citation grounding — ensuring the research agent's sources actually make it into the final artifact without being paraphrased into oblivion
  3. Error recovery — when agent 4 of 6 fails, you can't just restart the pipeline. The cost (time + API credits) is too high. Partial recovery is essential.

Lessons

The biggest lesson: treat agents like microservices, not like a conversation. They don't need to know about each other. They read from a shared state, do their job, write back. The orchestrator manages sequencing, retries, and validation between stages.