Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.getinfinityblue.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-5.4",
  "messages": [
    {
      "role": "user",
      "content": "Introduce yourself in one sentence."
    }
  ]
}
'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "content": "<string>",
        "name": "<string>",
        "tool_calls": [
          {
            "id": "<string>",
            "type": "<string>",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            }
          }
        ],
        "tool_call_id": "<string>",
        "reasoning_content": "<string>"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123,
      "text_tokens": 123,
      "audio_tokens": 123,
      "image_tokens": 123
    },
    "completion_tokens_details": {
      "text_tokens": 123,
      "audio_tokens": 123,
      "reasoning_tokens": 123
    }
  },
  "system_fingerprint": "<string>"
}

Models you can use

Pass any chat-capable model ID in model, for example:
Model IDNotes
gpt-5.4GPT-5 flagship — top reasoning / coding / agentic, 1M context
gpt-5.4-miniLightweight, balanced — great for high-volume and fallback
gemini-3.1-pro-previewGemini flagship — strong multimodal, 1M context
deepseek-v4-proDeepSeek cost-effective reasoning model
See GET /v1/models or the pricing page for the full list.

Streaming

Set stream: true to receive Server-Sent Events (SSE). Each line is data: {json} and the stream ends with data: [DONE]:
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"He"}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"llo"}}]}
data: [DONE]

Reasoning models

For reasoning-capable models, use reasoning_effort (low / medium / high) to control reasoning depth. The model returns its reasoning in the reasoning_content field — render it collapsed in your UI.

Tool calling

Define functions as JSON Schema in tools. The model returns structured tool_calls that your application executes and feeds back in a follow-up request. Use tool_choice to control the strategy (auto / none / required, or a specific function).

Authorizations

Authorization
string
header
required

Bearer token authentication, format: Authorization: Bearer sk-xxxxxx. Get your API key in the console.

Body

application/json
model
string
required

Model ID, e.g. gpt-5.4. See GET /v1/models for the full list.

Example:

"gpt-5.4"

messages
object[]
required

The messages comprising the conversation so far, in order.

temperature
number
default:1

Sampling temperature between 0 and 2. Higher values (e.g. 0.8) make output more random; lower values (e.g. 0.2) make it more focused and deterministic. Tune this or top_p, not both.

Required range: 0 <= x <= 2
top_p
number
default:1

Nucleus sampling. The model considers only tokens within the top top_p cumulative probability mass — e.g. 0.1 means only the top 10%. Tune this or temperature, not both.

Required range: 0 <= x <= 1
n
integer
default:1

Number of completions to generate for each input message.

Required range: x >= 1
stream
boolean
default:false

Whether to stream the response as Server-Sent Events.

stream_options
object

Options for streaming, only used when stream=true.

stop

Up to 4 stop sequences. Generation stops at any of them.

max_tokens
integer

Maximum tokens to generate in the completion (legacy). Use max_completion_tokens for reasoning models.

max_completion_tokens
integer

Maximum tokens to generate, including reasoning tokens.

presence_penalty
number
default:0

Between -2.0 and 2.0. Positive values penalize tokens that have already appeared, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2
frequency_penalty
number
default:0

Between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency, decreasing verbatim repetition.

Required range: -2 <= x <= 2
logit_bias
object

Bias map adjusting token likelihoods; keys are token IDs, values -100 to 100.

user
string

A unique identifier for your end user, useful for abuse monitoring.

tools
object[]

A list of tools the model may call. Currently only function is supported.

tool_choice

Controls whether and how the model calls tools. none disables calls, auto lets the model decide, required forces at least one call; or pass an object to force a specific function.

Available options:
none,
auto,
required
response_format
object

Controls the format of the model output.

seed
integer

Random seed. The same seed and params return results as consistent as possible.

reasoning_effort
enum<string>

Reasoning depth, only effective for reasoning-capable models.

Available options:
low,
medium,
high
modalities
enum<string>[]

The output modalities you want the model to return.

Available options:
text,
audio
audio
object

Audio output parameters, used when modalities includes audio.

Response

Successful response

id
string

Unique identifier for this completion.

object
string
Example:

"chat.completion"

created
integer

Unix timestamp (seconds) of creation.

model
string

The model that actually processed the request.

choices
object[]

The list of completions generated by the model.

usage
object

Token usage statistics for the request.

system_fingerprint
string