Chat / Text
OpenAI Protocol (GPT)
OpenAI Chat Completions / Responses compatible. For GPT, plus DeepSeek, GLM, MiMo, etc.
Endpoints
| Usage | Method | Path |
|---|---|---|
| Chat (REST + SSE) | POST | /v1/chat/completions |
| Codex (Responses) | POST | /v1/responses |
| Model list | GET | /v1/models |
Authentication
Authorization: Bearer <API_KEY> Content-Type: application/json
Supported Models
gpt-* (e.g. gpt-4o), deepseek-*, ZHIPU/GLM-*, xiaomi/mimo-*, etc. See GET /v1/models.
Request Parameters
| Param | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model, e.g. gpt-4o |
| messages | array | Required | Messages (role/content) |
| stream | boolean | — | true for SSE; add stream_options.include_usage=true to get usage |
| temperature / top_p | number | — | Sampling controls |
| max_tokens / max_completion_tokens | integer | — | Max output |
| tools / tool_choice | — | — | Tool calling |
Streaming (SSE)
Set "stream": true and "stream_options": {"include_usage": true}; receive SSE with curl -N. Events data: {chunk}, ends with data: [DONE]; usage in the final chunk.
Request Example
Non-streaming
curl -X POST "https://api.cqtai.com/v1/chat/completions" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role":"user","content":"你好,介绍一下你自己"}]
}'Streaming (SSE with usage)
curl -N -X POST "https://api.cqtai.com/v1/chat/completions" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"stream": true,
"stream_options": {"include_usage": true},
"messages": [{"role":"user","content":"写一首关于夏天的短诗"}]
}'Response Example
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"model": "gpt-4o",
"choices": [
{ "index": 0, "message": { "role": "assistant", "content": "你好!……" }, "finish_reason": "stop" }
],
"usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 }
}Billing & Credits
Billed per token: input × input price + output × output price (cache write ×1.25, cache read ×0.1), then × your rate. count_tokens is free. See the Intro page for model prices.