Chat Completions — API Reference

Package: dodil.ignite.v1 · Service: ModelService

Generate chat-style completions from any chat/instruct model in the catalog. The HTTP surface is OpenAI-compatible: the path is the top-level /v1/chat/completions (not under /v1/ignite/), and the JSON body matches the OpenAI Chat Completions request exactly — max_tokens, top_p, response_format, and the rest stay snake_case on both HTTP and gRPC. Drop in the OpenAI SDK and point it at the base URL: see Using OpenAI & Cohere SDKs and the Model Catalog.

Several fields are polymorphic and accept raw OpenAI JSON verbatim — content (a string or a multimodal array), tool_choice, and response_format. On the wire these are carried as google.protobuf.Value, so you can paste the exact object the OpenAI API expects.

RPC	HTTP	streaming
`ChatCompletion`	`POST /v1/chat/completions`	unary
`StreamChatCompletion`	`POST /v1/chat/completions/stream`	server-stream

gRPC reaches both methods at dodil.ignite.v1.ModelService/<Method> on $IGNITE_GRPC. See Conventions → Using gRPC for grpcurl setup. Both transports use the same OpenAI-native snake_case JSON body.

`ChatCompletion`

Unary completion. Returns the full response once generation finishes.

Request

HTTP


curl -sS -X POST "https://api.dev.dodil.io/v1/chat/completions" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Summarize the CAP theorem in one sentence." }
    ],
    "max_tokens": 256,
    "temperature": 0.7,
    "top_p": 0.95,
    "response_format": { "type": "text" }
  }'

Response

HTTP


{
  "id": "chatcmpl-0a1b2c3d",
  "object": "chat.completion",
  "created": 1748390400,
  "model": "kimi-k2.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A distributed store can guarantee at most two of consistency, availability, and partition tolerance at once."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 28, "completion_tokens": 19, "total_tokens": 47 }
}

`StreamChatCompletion`

Server-streaming variant. Emits a sequence of ChatCompletionChunk messages, each carrying an incremental delta per choice — the same Server-Sent-Events shape OpenAI clients expect. Set stream: true when using an OpenAI SDK; the dedicated HTTP path is /v1/chat/completions/stream.

Request

HTTP


curl -sS -N -X POST "https://api.dev.dodil.io/v1/chat/completions/stream" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [
      { "role": "user", "content": "Write a haiku about distributed systems." }
    ],
    "stream": true,
    "max_tokens": 128
  }'

Response

A stream of ChatCompletionChunk messages. Each choices[].delta is a partial ChatMessage; concatenate the delta.content fragments to reconstruct the message. finish_reason is set on the terminal chunk.


{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": { "content": "Nodes" } } ] }
{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": { "content": " drift apart" } } ] }
{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop" } ] }

Chat Completions — API Reference

ChatCompletion

Request

HTTP

gRPC

Response

HTTP

gRPC

StreamChatCompletion

Request

HTTP

gRPC

Response

See also

`ChatCompletion`

`StreamChatCompletion`