7acadae
CopilotKitDocs
  • Docs
  • Integrations
  • Reference
Get Started
QuickstartCoding Agents
Concepts
ArchitectureGenerative UI OverviewOSS vs Enterprise
Agentic Protocols
OverviewAG-UIAG-UI MiddlewareMCPA2A
Build Chat UIs
Prebuilt Components
CopilotChatCopilotSidebarCopilotPopup
Custom Look and Feel
CSS CustomizationSlots (Subcomponents)Fully Headless UIReasoning Messages
Multimodal AttachmentsVoice
Build Generative UI
Controlled
Tool-based Generative UITool RenderingState RenderingReasoning
Your Components
Display ComponentsInteractive Components
Declarative
A2UIDynamic Schema A2UIFixed Schema A2UI
Open-Ended
MCP Apps
Adding Agent Powers
Frontend ToolsShared State
Human-in-the-Loop
HITL OverviewPausing the Agent for InputHeadless Interrupts
Sub-AgentsAgent ConfigProgrammatic Control
Agents & Backends
Built-in Agent
Backend
Copilot RuntimeFactory ModeAG-UI
Runtime Server AdapterAuthentication
LangGraph (Python)
Your Components
Display-onlyInteractiveInterrupt-based
Shared state
Reading agent stateWriting agent stateInput/Output SchemasState streaming
ReadablesInterruptsConfigurableSubgraphsDeep Agents
Advanced
Disabling state streamingManually emitting messagesExiting the agent loop
Persistence
Loading Agent StateThreadsMessage Persistence
Videos
Video: Research Canvas
Error Debugging & ObservabilityCommon LangGraph issues
Troubleshooting Copilots
Migrate to AG-UI
Observe & Operate
InspectorVS Code Extension
Troubleshooting
Common Copilot IssuesError Debugging & ObservabilityDebug ModeAG-UI Event InspectorHook ExplorerError Observability Connectors
Enterprise
CopilotKit PremiumHow the Enterprise Intelligence Platform WorksHow Threads & Persistence WorkObservabilitySelf-Hosting IntelligenceThreads
Deploy
AWS AgentCore
What's New
Full MCP Apps SupportLangGraph Deep Agents in CopilotKitA2UI Launches with full AG-UI SupportCopilotKit v1.50Generative UI Spec SupportA2A and MCP Handshake
Migrate
Migrate to V2Migrate to 1.8.2
Other
Contributing
Code ContributionsDocumentation Contributions
Anonymous Telemetry
LangGraph (Python)Voice

Voice

Real-time speech-to-text in the chat composer. The user speaks, the runtime transcribes, the agent runs the resulting prompt.

You have a working chat surface and you want users to be able to speak instead of type. By the end of this guide, the chat composer will sprout a mic button, recorded audio will be transcribed by the runtime, and the transcript will auto-send to the agent like any other message.

When to use this#

  • Hands-free or accessibility flows where typing isn't the right input modality.
  • Mobile or kiosk surfaces where a long voice query is faster than thumb-typing.
  • Demo and test loops where you want canned audio to drive the chat without a microphone.

If you only need file uploads (audio, images, video, documents), use Multimodal Attachments instead. Voice is specifically about live transcription of recorded speech into chat input.

Live Demo: LangGraph (Python) — voiceOpen full demo →

Frontend#

<CopilotChat /> renders the mic button automatically when the runtime advertises audioFileTranscriptionEnabled: true on its /info endpoint. There's nothing to wire up on the chat surface itself:

frontend/src/app/page.tsx — chat surface
L3–24
import { CopilotKit } from "@copilotkit/react-core/v2";
import { VoiceChat } from "./voice-chat";

export default function VoiceDemoPage() {
  return (
    <CopilotKit
      runtimeUrl="/api/copilotkit-voice"
      agent="voice-demo"
      useSingleEndpoint={false}
      // The dev-only `<cpk-web-inspector>` overlay (auto-enabled on
      // localhost via shouldShowDevConsole) intercepts pointer events
      // on top of the voice sample-audio button, so dev/D5 probe runs
      // can't click it through Playwright. Production isn't localhost
      // so the inspector never mounts there — voice is D5 in prod and
      // D4 locally for this reason alone. Disable explicitly here so
      // the demo behaves the same in both environments.
      enableInspector={false}
    >
      <VoiceChat />
    </CopilotKit>
  );
}

When the user clicks the mic, the chat captures audio, POSTs it to the runtime's /transcribe endpoint, drops the resulting transcript into the composer, and submits.

Driving the demo without a mic#

For Playwright runs, screenshots, or any flow where prompting for mic permissions is awkward, ship a button that POSTs a bundled audio clip directly to the same /transcribe endpoint:

frontend/src/app/sample-audio-button.tsx
L26–42
export function SampleAudioButton({
  onTranscribed,
  sampleText,
}: SampleAudioButtonProps) {
  return (
    <button
      type="button"
      data-testid="voice-sample-audio-button"
      onClick={() => onTranscribed(sampleText)}
      title={`Inserts: "${sampleText}"`}
      className="inline-flex w-fit items-center gap-2 rounded-md border border-black/10 bg-white px-3 py-1.5 text-xs font-medium hover:bg-black/5 dark:border-white/10 dark:bg-black/30 dark:hover:bg-white/10"
    >
      <span aria-hidden>🎙</span>
      <span>Try a sample audio</span>
    </button>
  );
}

The caller can drop the resulting text into the composer's textarea (matched via data-testid="copilot-chat-textarea") using the native value setter and a synthetic input event so React's managed state updates correctly.

Backend#

Wire up the V2 runtime with a TranscriptionService. The V1 wrapper drops the transcriptionService option, so use createCopilotRuntimeHandler from @copilotkit/runtime/v2 directly:

app/api/copilotkit-voice/[[...slug]]/route.ts
L25–125
import type { NextRequest } from "next/server";
import {
  CopilotRuntime,
  TranscriptionService,
  createCopilotRuntimeHandler,
} from "@copilotkit/runtime/v2";
import type { TranscribeFileOptions } from "@copilotkit/runtime/v2";
import { LangGraphAgent } from "@copilotkit/runtime/langgraph";
import { TranscriptionServiceOpenAI } from "@copilotkit/voice";
import OpenAI from "openai";

const LANGGRAPH_URL =
  process.env.LANGGRAPH_DEPLOYMENT_URL || "http://localhost:8123";

const voiceDemoAgent = new LangGraphAgent({
  deploymentUrl: LANGGRAPH_URL,
  graphId: "sample_agent",
});

/**
 * Transcription service wrapper that reports a clean, typed auth error when
 * OPENAI_API_KEY is not configured. When the key is present we delegate to
 * the real OpenAI-backed service; any upstream Whisper error keeps its
 * natural categorization.
 *
 * Note: We pin `baseURL` to real OpenAI (or `OPENAI_TRANSCRIPTION_BASE_URL`
 * when explicitly set) instead of falling through to `OPENAI_BASE_URL`. In
 * local docker / Railway preview environments `OPENAI_BASE_URL` points at
 * aimock so LLM completions stay deterministic, but aimock has a catchall
 * `endpoint: "transcription"` fixture that would otherwise intercept every
 * real mic recording and return the canned "What is the weather in Tokyo?"
 * phrase regardless of what the user actually said. The sample-audio button
 * is the deterministic affordance (synchronous text injection); the mic is
 * the only path that should exercise real Whisper.
 */
class GuardedOpenAITranscriptionService extends TranscriptionService {
  private delegate: TranscriptionServiceOpenAI | null;

  constructor() {
    super();
    const apiKey = process.env.OPENAI_API_KEY;
    const baseURL =
      process.env.OPENAI_TRANSCRIPTION_BASE_URL ?? "https://api.openai.com/v1";
    this.delegate = apiKey
      ? new TranscriptionServiceOpenAI({
          openai: new OpenAI({ apiKey, baseURL }),
        })
      : null;
  }

  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    if (!this.delegate) {
      // "api key" substring → handleTranscribe maps to AUTH_FAILED → 401.
      throw new Error(
        "OPENAI_API_KEY not configured for this deployment (api key missing). " +
          "Set OPENAI_API_KEY to enable voice transcription.",
      );
    }
    return this.delegate.transcribeFile(options);
  }
}

// Cache the runtime + handler across invocations so the transcription service
// is constructed once per Node process instead of per request. The guarded
// service reads OPENAI_API_KEY lazily in its transcribeFile call path, so
// deferring construction past module load is not required for cold-start
// safety under missing-key conditions.
let cachedHandler: ((req: Request) => Promise<Response>) | null = null;
function getHandler(): (req: Request) => Promise<Response> {
  if (cachedHandler) return cachedHandler;

  const runtime = new CopilotRuntime({
    // @ts-ignore -- Published CopilotRuntime agents type wraps Record in
    // MaybePromise<NonEmptyRecord<...>> which rejects plain Records; fixed in
    // source, pending release.
    agents: {
      // The page mounts <CopilotKit agent="voice-demo">; resolve that to
      // the neutral sample_agent graph.
      "voice-demo": voiceDemoAgent,
      // useAgent() with no args defaults to "default"; alias so any internal
      // default-agent lookups resolve against the same graph.
      default: voiceDemoAgent,
    },
    transcriptionService: new GuardedOpenAITranscriptionService(),
  });

  cachedHandler = createCopilotRuntimeHandler({
    runtime,
    basePath: "/api/copilotkit-voice",
  });
  return cachedHandler;
}

// Next.js App Router bindings. This file lives at
// `src/app/api/copilotkit-voice/[[...slug]]/route.ts` — the catchall slug
// pattern forwards every sub-path (`/info`, `/agent/:id/run`,
// `/transcribe`, ...) to the V2 handler so its URL router can dispatch.
export const POST = (req: NextRequest) => getHandler()(req);
export const GET = (req: NextRequest) => getHandler()(req);
export const PUT = (req: NextRequest) => getHandler()(req);
export const DELETE = (req: NextRequest) => getHandler()(req);

With transcriptionService set, the runtime advertises audioFileTranscriptionEnabled: true on /info (which is what tells the chat to render the mic button) and routes POST /transcribe to the service.

Custom transcription backends#

TranscriptionService from @copilotkit/runtime/v2 is an abstract class. Subclass it to plug in any transcription provider — Whisper, AssemblyAI, Deepgram, your own model. The library ships TranscriptionServiceOpenAI as the canonical reference implementation.

A useful pattern is wrapping your service in a guard that returns a clean 4xx when credentials aren't configured, instead of an opaque 5xx from the underlying SDK:

backend — guarded transcription service
L25–85
import type { NextRequest } from "next/server";
import {
  CopilotRuntime,
  TranscriptionService,
  createCopilotRuntimeHandler,
} from "@copilotkit/runtime/v2";
import type { TranscribeFileOptions } from "@copilotkit/runtime/v2";
import { LangGraphAgent } from "@copilotkit/runtime/langgraph";
import { TranscriptionServiceOpenAI } from "@copilotkit/voice";
import OpenAI from "openai";

const LANGGRAPH_URL =
  process.env.LANGGRAPH_DEPLOYMENT_URL || "http://localhost:8123";

const voiceDemoAgent = new LangGraphAgent({
  deploymentUrl: LANGGRAPH_URL,
  graphId: "sample_agent",
});

/**
 * Transcription service wrapper that reports a clean, typed auth error when
 * OPENAI_API_KEY is not configured. When the key is present we delegate to
 * the real OpenAI-backed service; any upstream Whisper error keeps its
 * natural categorization.
 *
 * Note: We pin `baseURL` to real OpenAI (or `OPENAI_TRANSCRIPTION_BASE_URL`
 * when explicitly set) instead of falling through to `OPENAI_BASE_URL`. In
 * local docker / Railway preview environments `OPENAI_BASE_URL` points at
 * aimock so LLM completions stay deterministic, but aimock has a catchall
 * `endpoint: "transcription"` fixture that would otherwise intercept every
 * real mic recording and return the canned "What is the weather in Tokyo?"
 * phrase regardless of what the user actually said. The sample-audio button
 * is the deterministic affordance (synchronous text injection); the mic is
 * the only path that should exercise real Whisper.
 */
class GuardedOpenAITranscriptionService extends TranscriptionService {
  private delegate: TranscriptionServiceOpenAI | null;

  constructor() {
    super();
    const apiKey = process.env.OPENAI_API_KEY;
    const baseURL =
      process.env.OPENAI_TRANSCRIPTION_BASE_URL ?? "https://api.openai.com/v1";
    this.delegate = apiKey
      ? new TranscriptionServiceOpenAI({
          openai: new OpenAI({ apiKey, baseURL }),
        })
      : null;
  }

  async transcribeFile(options: TranscribeFileOptions): Promise<string> {
    if (!this.delegate) {
      // "api key" substring → handleTranscribe maps to AUTH_FAILED → 401.
      throw new Error(
        "OPENAI_API_KEY not configured for this deployment (api key missing). " +
          "Set OPENAI_API_KEY to enable voice transcription.",
      );
    }
    return this.delegate.transcribeFile(options);
  }
}
Supported by
Built-in Agent (TanStack AI)LangGraph (Python)LangGraph (TypeScript)LangGraph (FastAPI)Google ADKMastraCrewAI (Crews)PydanticAIClaude Agent SDK (Python)Claude Agent SDK (TypeScript)AgnoAG2LlamaIndexAWS StrandsLangroidMS Agent Framework (Python)MS Agent Framework (.NET)Spring AI
On this page
When to use thisFrontendDriving the demo without a micBackendCustom transcription backends