Skip to main content
POST
/
v1
/
responses
curl --request POST \
  --url https://api.getinfinityblue.com/v1/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-5.4",
  "input": "Introduce yourself in one sentence."
}
'
{
  "id": "<string>",
  "object": "<string>",
  "created_at": 123,
  "model": "<string>",
  "output": [
    {
      "type": "<string>",
      "id": "<string>",
      "status": "<string>",
      "role": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>"
        }
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123,
      "text_tokens": 123,
      "audio_tokens": 123,
      "image_tokens": 123
    },
    "completion_tokens_details": {
      "text_tokens": 123,
      "audio_tokens": 123,
      "reasoning_tokens": 123
    }
  }
}

Models you can use

Model IDNotes
gpt-5.4GPT-5 flagship — top reasoning / coding / agentic, 1M context
gpt-5.4-miniLightweight, balanced — great for high-volume and fallback
deepseek-v4-proDeepSeek cost-effective reasoning model
See GET /v1/models for the full list.

Multi-turn continuation

Pass the id from a previous response as previous_response_id to continue the conversation without resending the full history.

Reasoning control

For reasoning-capable models, use reasoning.effort (low / medium / high) to set reasoning depth, and reasoning.summary (auto / concise / detailed) to control how much reasoning detail is returned.

Context truncation

Set truncation to auto to let the system automatically drop older context when the window is exceeded. Set to disabled to return an error instead.

Authorizations

Authorization
string
header
required

Bearer token authentication, format: Authorization: Bearer sk-xxxxxx. Get your API key in the console.

Body

application/json

OpenAI Responses API request body.

model
string
required

Model ID, e.g. gpt-5.4. See GET /v1/models for the full list.

Example:

"gpt-5.4"

input

Input content — either a plain text string or an array of messages. Omit when using previous_response_id to continue a prior turn.

instructions
string

System-level instructions, equivalent to a system message in Chat Completions.

max_output_tokens
integer

Maximum number of tokens the model may generate in this response, including reasoning tokens.

temperature
number

Sampling temperature between 0 and 2, controlling output randomness.

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling threshold. Tune this or temperature, not both.

Required range: 0 <= x <= 1
stream
boolean
default:false

Whether to stream the response as Server-Sent Events.

tools
object[]

A list of tools the model may call.

tool_choice

Tool calling strategy — auto, none, or required as a string, or an object specifying a particular tool.

Available options:
auto,
none,
required
reasoning
object

Reasoning configuration, only effective for reasoning-capable models.

previous_response_id
string

The id of a prior response. When set, the conversation continues from that point without resending history.

truncation
enum<string>

Context truncation strategy. auto drops older context when the window is exceeded; disabled returns an error instead.

Available options:
auto,
disabled

Response

Successful response

OpenAI Responses API response body.

id
string

Unique identifier for this response, usable as previous_response_id in the next turn.

object
string

Object type, value is response.

Example:

"response"

created_at
integer

Unix timestamp (seconds) of creation.

status
enum<string>

Response status.

Available options:
completed,
failed,
in_progress,
incomplete
model
string

The model that actually processed the request.

output
object[]

List of output blocks generated by the model.

usage
object

Token usage statistics for the request.