Create chat completion
Generate text, vision, tool-calling, and streaming responses in OpenAI-compatible format
Models you can use
Pass any chat-capable model ID inmodel, for example:
| Model ID | Notes |
|---|---|
gpt-5.4 | GPT-5 flagship — top reasoning / coding / agentic, 1M context |
gpt-5.4-mini | Lightweight, balanced — great for high-volume and fallback |
gemini-3.1-pro-preview | Gemini flagship — strong multimodal, 1M context |
deepseek-v4-pro | DeepSeek cost-effective reasoning model |
GET /v1/models or the
pricing page for the full list.
Streaming
Setstream: true to receive Server-Sent Events (SSE). Each line is
data: {json} and the stream ends with data: [DONE]:
Reasoning models
For reasoning-capable models, usereasoning_effort (low /
medium / high) to control reasoning depth. The model returns its
reasoning in the reasoning_content field — render it collapsed in
your UI.
Tool calling
Define functions as JSON Schema intools. The model returns
structured tool_calls that your application executes and feeds back
in a follow-up request. Use tool_choice to control the strategy
(auto / none / required, or a specific function).Authorizations
Body
Model ID, e.g. gpt-5.4. See GET /v1/models for the full list.
"gpt-5.4"
The messages comprising the conversation so far, in order.
Sampling temperature between 0 and 2. Higher values (e.g. 0.8)
make output more random; lower values (e.g. 0.2) make it more
focused and deterministic. Tune this or top_p, not both.
0 <= x <= 2Nucleus sampling. The model considers only tokens within the top
top_p cumulative probability mass — e.g. 0.1 means only the top
10%. Tune this or temperature, not both.
0 <= x <= 1Number of completions to generate for each input message.
x >= 1Whether to stream the response as Server-Sent Events.
Options for streaming, only used when stream=true.
Up to 4 stop sequences. Generation stops at any of them.
Maximum tokens to generate in the completion (legacy). Use
max_completion_tokens for reasoning models.
Maximum tokens to generate, including reasoning tokens.
Between -2.0 and 2.0. Positive values penalize tokens that have already appeared, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2Between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency, decreasing verbatim repetition.
-2 <= x <= 2Bias map adjusting token likelihoods; keys are token IDs, values -100 to 100.
A unique identifier for your end user, useful for abuse monitoring.
A list of tools the model may call. Currently only function is supported.
Controls whether and how the model calls tools. none disables
calls, auto lets the model decide, required forces at least one
call; or pass an object to force a specific function.
none, auto, required Controls the format of the model output.
Random seed. The same seed and params return results as consistent as possible.
Reasoning depth, only effective for reasoning-capable models.
low, medium, high The output modalities you want the model to return.
text, audio Audio output parameters, used when modalities includes audio.
Response
Successful response
Unique identifier for this completion.
"chat.completion"
Unix timestamp (seconds) of creation.
The model that actually processed the request.
The list of completions generated by the model.
Token usage statistics for the request.