Multimodal Inputs
UserMessage.content accepts either plain text or an ordered array of
multimodal content parts.
const message: UserMessage = {
id: "user-1",
role: "user",
content: [
{ type: "text", text: "Summarize this PDF and screenshot" },
{
type: "image",
source: {
type: "url",
value: "https://example.com/screen.png",
mimeType: "image/png",
},
},
{
type: "document",
source: {
type: "url",
value: "https://example.com/report.pdf",
mimeType: "application/pdf",
},
},
],
};
Source Types
Use source.type to describe payload delivery:
data: Inline base64 payload with requiredmimeTypeurl: HTTP(S) or data URL, optionalmimeType
Common Use Cases
Visual QA
{
id: "q1",
role: "user",
content: [
{ type: "text", text: "What issue do you see in this UI?" },
{
type: "image",
source: { type: "url", value: "https://example.com/ui.png", mimeType: "image/png" },
metadata: { detail: "high" },
},
],
}
Audio transcription
{
id: "q2",
role: "user",
content: [
{ type: "text", text: "Transcribe this recording." },
{
type: "audio",
source: { type: "url", value: "https://example.com/meeting.wav", mimeType: "audio/wav" },
},
],
}
Mixed media comparison
{
id: "q3",
role: "user",
content: [
{ type: "text", text: "Compare the screenshot with the spec." },
{
type: "image",
source: { type: "data", value: "iVBORw0KGgo...", mimeType: "image/png" },
},
{
type: "document",
source: { type: "url", value: "https://example.com/spec.pdf", mimeType: "application/pdf" },
},
],
}