Skip to Content
We are live but in Staging 🎉
ModelsAPI ReferenceChat Completions

Chat Completions — API Reference

Package: dodil.ignite.v1 · Service: ModelService

Generate chat-style completions from any chat/instruct model in the catalog. The HTTP surface is OpenAI-compatible: the path is the top-level /v1/chat/completions (not under /v1/ignite/), and the JSON body matches the OpenAI Chat Completions request exactlymax_tokens, top_p, response_format, and the rest stay snake_case on both HTTP and gRPC. Drop in the OpenAI SDK and point it at the base URL: see Using OpenAI & Cohere SDKs and the Model Catalog.

Several fields are polymorphic and accept raw OpenAI JSON verbatim — content (a string or a multimodal array), tool_choice, and response_format. On the wire these are carried as google.protobuf.Value, so you can paste the exact object the OpenAI API expects.

RPCHTTPstreaming
ChatCompletionPOST /v1/chat/completionsunary
StreamChatCompletionPOST /v1/chat/completions/streamserver-stream

gRPC reaches both methods at dodil.ignite.v1.ModelService/<Method> on $IGNITE_GRPC. See Conventions → Using gRPC for grpcurl setup. Both transports use the same OpenAI-native snake_case JSON body.

ChatCompletion

Unary completion. Returns the full response once generation finishes.

Request

curl -sS -X POST "https://api.dev.dodil.io/v1/chat/completions" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2.5", "messages": [ { "role": "system", "content": "You are a concise assistant." }, { "role": "user", "content": "Summarize the CAP theorem in one sentence." } ], "max_tokens": 256, "temperature": 0.7, "top_p": 0.95, "response_format": { "type": "text" } }'

Response

{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion", "created": 1748390400, "model": "kimi-k2.5", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "A distributed store can guarantee at most two of consistency, availability, and partition tolerance at once." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 28, "completion_tokens": 19, "total_tokens": 47 } }

StreamChatCompletion

Server-streaming variant. Emits a sequence of ChatCompletionChunk messages, each carrying an incremental delta per choice — the same Server-Sent-Events shape OpenAI clients expect. Set stream: true when using an OpenAI SDK; the dedicated HTTP path is /v1/chat/completions/stream.

Request

curl -sS -N -X POST "https://api.dev.dodil.io/v1/chat/completions/stream" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2.5", "messages": [ { "role": "user", "content": "Write a haiku about distributed systems." } ], "stream": true, "max_tokens": 128 }'

Response

A stream of ChatCompletionChunk messages. Each choices[].delta is a partial ChatMessage; concatenate the delta.content fragments to reconstruct the message. finish_reason is set on the terminal chunk.

{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": { "content": "Nodes" } } ] } { "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": { "content": " drift apart" } } ] } { "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop" } ] }

See also