Sub-Agents
Decompose work across multiple specialized agents with a visible delegation log.
What is this?#
Sub-agents are the canonical multi-agent pattern: a top-level supervisor LLM orchestrates one or more specialized sub-agents by exposing each of them as a tool. The supervisor decides what to delegate, the sub-agents do their narrow job, and their results flow back up to the supervisor's next step.
This is fundamentally the same shape as tool-calling, but each "tool" is itself a full-blown agent with its own system prompt and (often) its own tools, memory, and model.
When should I use this?#
Reach for sub-agents when a task has distinct specialized sub-tasks that each benefit from their own focus:
- Research → Write → Critique pipelines, where each stage needs a different system prompt and temperature.
- Router + specialists, where one agent classifies the request and dispatches to the right expert.
- Divide-and-conquer — any problem that fits cleanly into parallel or sequential sub-problems.
The example below uses the Research → Write → Critique shape as the canonical example.
Setting up sub-agents#
Each sub-agent is a full create_agent(...) call with its own model,
its own system prompt, and (optionally) its own tools. They don't share
memory or tools with the supervisor; the supervisor only ever sees
what the sub-agent returns.
from __future__ import annotations
import functools
import json
import logging
import os
import uuid
from typing import Annotated, Any, AsyncGenerator, Literal, TypedDict
from ag_ui.core import (
EventType,
RunAgentInput,
RunErrorEvent,
RunFinishedEvent,
RunStartedEvent,
StateSnapshotEvent,
TextMessageContentEvent,
TextMessageEndEvent,
TextMessageStartEvent,
ToolCallArgsEvent,
ToolCallEndEvent,
ToolCallStartEvent,
)
from fastapi import Request
from fastapi.responses import JSONResponse, StreamingResponse
import langroid as lr
import langroid.language_models as lm
from langroid.agent.tool_message import ToolMessage
logger = logging.getLogger(__name__)
# =====================================================================
# Shared state shape
# =====================================================================
class Delegation(TypedDict):
id: str
sub_agent: Literal["research_agent", "writing_agent", "critique_agent"]
task: str
status: Literal["running", "completed", "failed"]
result: str
# =====================================================================
# Sub-agent system prompts (single-task, no tools)
# =====================================================================
# In Langroid, each sub-agent is a `lr.ChatAgent` with a single-task
# `system_message` and no tools. The supervisor only ever sees the
# sub-agent's final-message content — no shared memory, no shared tools.
_RESEARCH_SYSTEM = (
"You are a research sub-agent. Given a topic, produce a concise "
"bulleted list of 3-5 key facts. No preamble, no closing."
)
_WRITING_SYSTEM = (
"You are a writing sub-agent. Given a brief and optional source facts, "
"produce a polished 1-paragraph draft. Be clear and concrete. No preamble."
)
_CRITIQUE_SYSTEM = (
"You are an editorial critique sub-agent. Given a draft, give 2-3 "
"crisp, actionable critiques. No preamble."
)
_SUB_PROMPTS: dict[str, str] = {
"research_agent": _RESEARCH_SYSTEM,
"writing_agent": _WRITING_SYSTEM,
"critique_agent": _CRITIQUE_SYSTEM,
}
def _resolve_sub_model() -> str:
"""Resolve the sub-agent model.
Mirrors ``_resolve_a2ui_model`` in ``agents.agent``: bare model name
(langroid passes the string literally to the OpenAI SDK, which
rejects ``openai/gpt-4.1`` as "model not found").
"""
return os.getenv("SUBAGENT_MODEL") or os.getenv("LANGROID_MODEL") or "gpt-4.1"
@functools.lru_cache(maxsize=8)
def _build_sub_llm_config(name: str) -> lm.OpenAIGPTConfig:
"""Build (and memoize) the immutable ``OpenAIGPTConfig`` for one sub-agent.
Only the LLM config — which is stateless and credential-bearing — is
cached. The ``ChatAgent`` itself is rebuilt per call (see
``_build_sub_agent``) because ``lr.ChatAgent`` accumulates
``message_history`` across ``llm_response`` / ``llm_response_async``
calls and must NOT be shared across concurrent requests.
"""
# ``name`` participates in the cache key indirectly via per-name
# callsites; the config itself is identical across sub-agents today
# but keeping the parameter makes the cache robust if a future
# refactor varies model/temperature per sub-agent.
del name # currently unused — kept for cache-key shape stability
model = _resolve_sub_model()
return lm.OpenAIGPTConfig(
chat_model=model,
# Sub-agents are single-shot — non-streaming keeps the supervisor
# turn deterministic (we want the full result before recording
# the delegation as completed).
stream=False,
)
def _build_sub_agent(name: str) -> lr.ChatAgent:
"""Build a fresh ``ChatAgent`` for one sub-agent invocation.
A new agent is constructed on every call. Caching the agent
instance (e.g. via ``lru_cache``) would be unsafe: ``lr.ChatAgent``
accumulates ``message_history`` across ``llm_response_async`` calls,
so two concurrent users invoking the same sub-agent would
cross-contaminate each other's conversation history and grow the
token budget unboundedly across the process lifetime.
The immutable LLM config is cached separately (see
``_build_sub_llm_config``) so we don't pay credential-resolution
overhead per call.
"""
system_prompt = _SUB_PROMPTS[name]
llm_config = _build_sub_llm_config(name)
agent_config = lr.ChatAgentConfig(
llm=llm_config,
system_message=system_prompt,
)
return lr.ChatAgent(agent_config)
Keep sub-agent system prompts narrow and focused. The point of this pattern is that each one does one thing well. If a sub-agent needs to know the whole user context to do its job, that's a signal the boundary is wrong.
Exposing sub-agents as tools#
The supervisor delegates by calling tools. Each tool is a thin wrapper
around sub_agent.invoke(...) that:
- Runs the sub-agent synchronously on the supplied
taskstring. - Records the delegation into a
delegationsslot in shared agent state (so the UI can render a live log). - Returns the sub-agent's final message as a
ToolMessage, which the supervisor sees as a normal tool result on its next turn.
from __future__ import annotations
import functools
import json
import logging
import os
import uuid
from typing import Annotated, Any, AsyncGenerator, Literal, TypedDict
from ag_ui.core import (
EventType,
RunAgentInput,
RunErrorEvent,
RunFinishedEvent,
RunStartedEvent,
StateSnapshotEvent,
TextMessageContentEvent,
TextMessageEndEvent,
TextMessageStartEvent,
ToolCallArgsEvent,
ToolCallEndEvent,
ToolCallStartEvent,
)
from fastapi import Request
from fastapi.responses import JSONResponse, StreamingResponse
import langroid as lr
import langroid.language_models as lm
from langroid.agent.tool_message import ToolMessage
logger = logging.getLogger(__name__)
# =====================================================================
# Shared state shape
# =====================================================================
class Delegation(TypedDict):
id: str
sub_agent: Literal["research_agent", "writing_agent", "critique_agent"]
task: str
status: Literal["running", "completed", "failed"]
result: str
# =====================================================================
# Sub-agent system prompts (single-task, no tools)
# =====================================================================
# In Langroid, each sub-agent is a `lr.ChatAgent` with a single-task
# `system_message` and no tools. The supervisor only ever sees the
# sub-agent's final-message content — no shared memory, no shared tools.
_RESEARCH_SYSTEM = (
"You are a research sub-agent. Given a topic, produce a concise "
"bulleted list of 3-5 key facts. No preamble, no closing."
)
_WRITING_SYSTEM = (
"You are a writing sub-agent. Given a brief and optional source facts, "
"produce a polished 1-paragraph draft. Be clear and concrete. No preamble."
)
_CRITIQUE_SYSTEM = (
"You are an editorial critique sub-agent. Given a draft, give 2-3 "
"crisp, actionable critiques. No preamble."
)
_SUB_PROMPTS: dict[str, str] = {
"research_agent": _RESEARCH_SYSTEM,
"writing_agent": _WRITING_SYSTEM,
"critique_agent": _CRITIQUE_SYSTEM,
}
def _resolve_sub_model() -> str:
"""Resolve the sub-agent model.
Mirrors ``_resolve_a2ui_model`` in ``agents.agent``: bare model name
(langroid passes the string literally to the OpenAI SDK, which
rejects ``openai/gpt-4.1`` as "model not found").
"""
return os.getenv("SUBAGENT_MODEL") or os.getenv("LANGROID_MODEL") or "gpt-4.1"
@functools.lru_cache(maxsize=8)
def _build_sub_llm_config(name: str) -> lm.OpenAIGPTConfig:
"""Build (and memoize) the immutable ``OpenAIGPTConfig`` for one sub-agent.
Only the LLM config — which is stateless and credential-bearing — is
cached. The ``ChatAgent`` itself is rebuilt per call (see
``_build_sub_agent``) because ``lr.ChatAgent`` accumulates
``message_history`` across ``llm_response`` / ``llm_response_async``
calls and must NOT be shared across concurrent requests.
"""
# ``name`` participates in the cache key indirectly via per-name
# callsites; the config itself is identical across sub-agents today
# but keeping the parameter makes the cache robust if a future
# refactor varies model/temperature per sub-agent.
del name # currently unused — kept for cache-key shape stability
model = _resolve_sub_model()
return lm.OpenAIGPTConfig(
chat_model=model,
# Sub-agents are single-shot — non-streaming keeps the supervisor
# turn deterministic (we want the full result before recording
# the delegation as completed).
stream=False,
)
def _build_sub_agent(name: str) -> lr.ChatAgent:
"""Build a fresh ``ChatAgent`` for one sub-agent invocation.
A new agent is constructed on every call. Caching the agent
instance (e.g. via ``lru_cache``) would be unsafe: ``lr.ChatAgent``
accumulates ``message_history`` across ``llm_response_async`` calls,
so two concurrent users invoking the same sub-agent would
cross-contaminate each other's conversation history and grow the
token budget unboundedly across the process lifetime.
The immutable LLM config is cached separately (see
``_build_sub_llm_config``) so we don't pay credential-resolution
overhead per call.
"""
system_prompt = _SUB_PROMPTS[name]
llm_config = _build_sub_llm_config(name)
agent_config = lr.ChatAgentConfig(
llm=llm_config,
system_message=system_prompt,
)
return lr.ChatAgent(agent_config)
async def _invoke_sub_agent(name: str, task: str) -> str:
"""Run a sub-agent on ``task`` and return its final-message content.
Uses ``llm_response_async`` so the SSE writer stays cooperative —
a synchronous ``sub.llm_response(task)`` would block the event loop
for the entire LLM round-trip and stall any other concurrent SSE
responses sharing this worker.
Raises ``RuntimeError`` (with the exception class chained via
``__cause__``) on transport / SDK failures so the caller can record
a ``failed`` delegation. The original exception is preserved
server-side via ``logger.exception``.
"""
sub = _build_sub_agent(name)
try:
response = await sub.llm_response_async(task)
except Exception as exc: # noqa: BLE001 — see docstring
logger.exception("subagent %s call failed", name)
# Match the google-adk surface: only the class name leaks; the
# full traceback stays in server logs.
raise RuntimeError(
f"sub-agent call failed: {exc.__class__.__name__} "
"(see server logs for details)"
) from exc
if response is None:
raise RuntimeError("sub-agent returned no response")
content = getattr(response, "content", None) or ""
if not content:
raise RuntimeError("sub-agent returned empty content")
return content
# =====================================================================
# Supervisor tools (langroid ToolMessage subclasses)
# =====================================================================
# In Langroid, the supervisor delegates by emitting a tool call against
# one of these `ToolMessage` subclasses. The SSE adapter intercepts the
# call (rather than letting Langroid dispatch to `.handle`), runs the
# matching sub-agent, records a `Delegation` into shared state, and
# returns the sub-agent's output as the tool result.
class _SubAgentTool(ToolMessage):
"""Base class for the three supervisor delegation tools.
The actual sub-agent invocation happens in the SSE adapter (so we
can record delegations into shared state); this ``handle`` is a
placeholder that's never called in the normal flow — we intercept
the tool call before langroid would dispatch to it. Logging here
matches the frontend-tool pattern in ``agents.agent``.
"""
request: str = "_subagent_base" # overridden
purpose: str = "" # overridden
task: Annotated[
str,
"The exact task for the sub-agent. Pass relevant facts/draft "
"from prior delegations through this string.",
]
def handle(self) -> str:
logger.error(
"%s.handle fired server-side — adapter dispatch regression; "
"the supervisor sub-agent tool was not intercepted",
self.__class__.__name__,
)
return f"{self.request} dispatched"
class ResearchAgentTool(_SubAgentTool):
request: str = "research_agent"
purpose: str = (
"Delegate a research task to the research sub-agent. Use for: "
"gathering facts, background, definitions, statistics. Returns a "
"bulleted list of key facts."
)
class WritingAgentTool(_SubAgentTool):
request: str = "writing_agent"
purpose: str = (
"Delegate a drafting task to the writing sub-agent. Use for: "
"producing a polished paragraph, draft, or summary. Pass relevant "
"facts from prior research inside `task`."
)
class CritiqueAgentTool(_SubAgentTool):
request: str = "critique_agent"
purpose: str = (
"Delegate a critique task to the critique sub-agent. Use for: "
"reviewing a draft and suggesting concrete improvements."
)
_SUPERVISOR_TOOLS: tuple[type[ToolMessage], ...] = (
ResearchAgentTool,
WritingAgentTool,
CritiqueAgentTool,
)This is where CopilotKit's shared-state channel earns its keep: the
supervisor's tool calls mutate delegations as they happen, and the
frontend renders every new entry live.
Rendering a live delegation log#
On the frontend, the delegation log is just a reactive render of the
delegations slot. Subscribe with useAgent({ updates: [OnStateChanged, OnRunStatusChanged] }), read agent.state.delegations,
and render one card per entry.
/**
* Live delegation log — renders the `delegations` slot of agent state.
*
* Each entry corresponds to one invocation of a sub-agent. The list
* grows in real time as the supervisor fans work out to its children.
* Entries first appear with `status: "running"` (while the secondary
* Langroid ChatAgent call is in flight) and flip to `"completed"` or
* `"failed"` once the sub-agent returns.
*/
export function DelegationLog({ delegations, isRunning }: DelegationLogProps) {
return (
<div
data-testid="delegation-log"
className="w-full h-full flex flex-col bg-white rounded-2xl shadow-sm border border-[#DBDBE5] overflow-hidden"
>
<div className="flex items-center justify-between px-6 py-3 border-b border-[#E9E9EF] bg-[#FAFAFC]">
<div className="flex items-center gap-3">
<span className="text-lg font-semibold text-[#010507]">
Sub-agent delegations
</span>
{isRunning && (
<span
data-testid="supervisor-running"
className="inline-flex items-center gap-1.5 px-2 py-0.5 rounded-full border border-[#BEC2FF] bg-[#BEC2FF1A] text-[#010507] text-[10px] font-semibold uppercase tracking-[0.12em]"
>
<span className="w-1.5 h-1.5 rounded-full bg-[#010507] animate-pulse" />
Supervisor running
</span>
)}
</div>
<span
data-testid="delegation-count"
className="text-xs font-mono text-[#838389]"
>
{delegations.length} calls
</span>
</div>
<div className="flex-1 overflow-y-auto p-4 space-y-3">
{delegations.length === 0 ? (
<p className="text-[#838389] italic text-sm">
Ask the supervisor to complete a task. Every sub-agent it calls will
appear here.
</p>
) : (
delegations.map((d, idx) => {
const style = SUB_AGENT_STYLE[d.sub_agent];
return (
<div
key={d.id}
data-testid="delegation-entry"
className="border border-[#E9E9EF] rounded-xl p-3 bg-[#FAFAFC]"
>
<div className="flex items-center justify-between mb-2">
<div className="flex items-center gap-2">
<span className="text-xs font-mono text-[#AFAFB7]">
#{idx + 1}
</span>
<span
className={`inline-flex items-center gap-1 px-2 py-0.5 rounded-full text-[10px] font-semibold uppercase tracking-[0.1em] border ${style.color}`}
>
<span>{style.emoji}</span>
<span>{style.label}</span>
</span>
</div>
<span
className={`text-[10px] uppercase tracking-[0.12em] font-semibold ${STATUS_COLOR[d.status]}`}
>
{d.status}
</span>
</div>
<div className="text-xs text-[#57575B] mb-2">
<span className="font-semibold text-[#010507]">Task: </span>
{d.task}
</div>
{d.result ? (
<div className="text-sm text-[#010507] whitespace-pre-wrap bg-white rounded-lg p-2.5 border border-[#E9E9EF]">
{d.result}
</div>
) : (
<div className="text-xs text-[#838389] italic">
Sub-agent running…
</div>
)}
</div>
);
})
)}
</div>
</div>
);
}The result: as the supervisor fans work out to its sub-agents, the log grows in real time, giving the user visibility into a process that would otherwise be a long opaque spinner.
Related#
- Shared State — the channel that makes the delegation log live.
- State streaming — stream individual sub-agent outputs token-by-token inside each log entry.
