Create a ChatCompletion

Overview

Create chat completions using multiple LLM providers (OpenAI, Anthropic, Google). Supports text and multimodal input (images, audio, video, files), streaming responses via SSE, async mode for long-running thinking models, tool calling (function calling), and structured output (JSON schema).

Supported Models

ProviderModels
OpenAIopenai/gpt-5, openai/gpt-4o, openai/gpt-4o-mini, openai/o3-mini, openai/o1
Anthropicanthropic/claude-sonnet-4-5-20250929, anthropic/claude-haiku-4-5-20251001, anthropic/claude-3-5-sonnet-latest, anthropic/claude-3-5-haiku-latest
Googlegoogle/gemini-2.5-pro, google/gemini-2.0-flash, google/gemini-2.0-pro

Streaming

By default, responses are streamed as Server-Sent Events (SSE). Set stream: false for a single JSON response.

Async Mode

Set async: true to queue the request and receive a run UID. Poll GET /v3/ai/chat_completion_runs/{uid} for the result. Recommended for thinking models.

Available for Staff tokens

Recent Requests
Log in to see full request history
TimeStatusUser Agent
Retrieving recent requests…
LoadingLoading…
Body Params
string
required

Model identifier in provider/model format. Supported providers: openai, anthropic, google (e.g., "openai/gpt-4o", "anthropic/claude-sonnet-4-5-20250929", "google/gemini-2.5-pro").

messages
array of objects
required

Array of messages representing the conversation history. Must include at least one message with role 'user'.

messages*
boolean
Defaults to true

Whether to stream the response as Server-Sent Events (SSE). When true, the response is delivered incrementally as text/event-stream. Defaults to true.

boolean
Defaults to false

Process the request asynchronously. Returns a run UID immediately that can be polled via GET /v3/ai/chat_completion_runs/{uid}. Recommended for thinking models (e.g., openai/o1, openai/o3-mini) that may take longer to respond.

number
0 to 2

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more focused and deterministic.

integer
≥ 1

Maximum number of tokens to generate in the completion. The total length of input tokens and generated tokens is limited by the model's context length.

number
0 to 1

Top-p (nucleus) sampling parameter between 0 and 1. The model considers the results of the tokens with top_p probability mass. For example, 0.1 means only tokens comprising the top 10% probability mass are considered.

tools
array of objects

Array of tool definitions available for the model to call during the completion. The model may choose to call zero or more of these tools based on the conversation context.

tools
response_format
object

Response format specification for structured output. Use this to constrain the model's output to a specific format such as JSON. When not specified, the model returns free-form text.

Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.

Headers
string
enum
Defaults to application/json

Generated from available response content types

Allowed:
Responses

Language
Credentials
Bearer
JWT
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json
text/event-stream