Claude Subagents: A Practical Guide with Working Code

How to orchestrate multiple AI agents to solve complex problems faster

What Is a Subagent?

A subagent is a Claude call that you launch from inside another Claude call—or from your own orchestration code—to handle a focused subtask. The parent agent breaks a problem into pieces, delegates them out, then assembles the results.

Think of it like managing a team: you don't do everything yourself. You break down the project, assign tasks to specialists who work in parallel, and integrate their outputs into a final deliverable.

In code, a subagent is just a regular Anthropic SDK call—there is no special API. What makes it a "subagent" is the pattern: parent spawns children, children return results, parent consolidates.

Why Subagents Matter

Three concrete benefits:

1. Parallel speed. If you need 4 independent analyses, running them in parallel with asyncio.gather() cuts wall-clock time from 4× to 1×.

2. Specialization. Each subagent gets a tightly focused system prompt. A "security reviewer" subagent performs better at security review than a generalist prompted with "also check security."

3. Context isolation. Subagents don't share context windows. The parent can fire off large parallel workloads without any single Claude call paying the token cost of everything combined.

The Three Core Patterns

Pattern	Shape	When to use
Fan-out + Consolidate	Parent → N parallel children → Parent synthesizes	Independent analyses on the same input
Pipeline	A → B → C (sequential)	Each step needs the prior step's output
Fire and Forget	Parent launches child, continues without waiting	Background tasks, non-critical enrichment

Pattern 1: Fan-out + Consolidate

Example: Reviewing a code snippet for security, style, and test coverage—simultaneously, in parallel.

Each specialist subagent receives the same code and returns a focused report. The parent synthesizes them into one final review.

                  ┌─── Security Agent ──┐
Parent (code) ───►├─── Style Agent ─────┼──► Parent (consolidates)
                  └─── Test Agent ───────┘

Full Working Script

See 01_fanout_code_review.py

The code can be seen at https://github.com/lachwaninitin/subagents-demo

Pattern 2: Pipeline

Example: Research → Draft → Edit

The output of each stage flows directly into the next. This is sequential by design—you can't draft before you research.

Research Agent → Draft Agent → Edit Agent → Final output

Full Working Script

See 02_pipeline_research.py

The code can be seen at https://github.com/lachwaninitin/subagents-demo

Pattern 3: Fire and Forget

Example: After generating a main response, kick off a background task (e.g., summarize for a log, translate to another language) without blocking the user-facing response.

Main response ──► user (immediately)
                  └──► Background agent (runs async, result logged)

Full Working Script

See 03_fire_and_forget.py

The code can be seen at https://github.com/lachwaninitin/subagents-demo

Prompt Caching with Subagents

When multiple subagents share the same large system prompt or context (e.g., all reviewing the same large document), prompt caching prevents re-paying that token cost on each call.

Mark stable content with cache_control: {"type": "ephemeral"}. The first call writes the cache; subsequent calls read from it at ~10% of the original cost.

See 04_cached_subagents.py for a working example where 4 subagents all analyze the same AIGP framework document.

The code can be seen at https://github.com/lachwaninitin/subagents-demo

My Learning Cafe

Monday, May 4, 2026