Chat Completions — API Reference
Package: dodil.ignite.v1 · Service: ModelService
Generate chat-style completions from any chat/instruct model in the catalog. The HTTP surface is OpenAI-compatible: the path is the top-level /v1/chat/completions (not under /v1/ignite/), and the JSON body matches the OpenAI Chat Completions request exactly — max_tokens, top_p, response_format, and the rest stay snake_case on both HTTP and gRPC. Drop in the OpenAI SDK and point it at the base URL: see Using OpenAI & Cohere SDKs and the Model Catalog.
Several fields are polymorphic and accept raw OpenAI JSON verbatim — content (a string or a multimodal array), tool_choice, and response_format. On the wire these are carried as google.protobuf.Value, so you can paste the exact object the OpenAI API expects.
| RPC | HTTP | streaming |
|---|---|---|
ChatCompletion | POST /v1/chat/completions | unary |
StreamChatCompletion | POST /v1/chat/completions/stream | server-stream |
gRPC reaches both methods at dodil.ignite.v1.ModelService/<Method> on $IGNITE_GRPC. See Conventions → Using gRPC for grpcurl setup. Both transports use the same OpenAI-native snake_case JSON body.
ChatCompletion
Unary completion. Returns the full response once generation finishes.
Request
HTTP
curl -sS -X POST "https://api.dev.dodil.io/v1/chat/completions" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"messages": [
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "Summarize the CAP theorem in one sentence." }
],
"max_tokens": 256,
"temperature": 0.7,
"top_p": 0.95,
"response_format": { "type": "text" }
}'Response
HTTP
{
"id": "chatcmpl-0a1b2c3d",
"object": "chat.completion",
"created": 1748390400,
"model": "kimi-k2.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A distributed store can guarantee at most two of consistency, availability, and partition tolerance at once."
},
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 28, "completion_tokens": 19, "total_tokens": 47 }
}StreamChatCompletion
Server-streaming variant. Emits a sequence of ChatCompletionChunk messages, each carrying an incremental delta per choice — the same Server-Sent-Events shape OpenAI clients expect. Set stream: true when using an OpenAI SDK; the dedicated HTTP path is /v1/chat/completions/stream.
Request
HTTP
curl -sS -N -X POST "https://api.dev.dodil.io/v1/chat/completions/stream" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2.5",
"messages": [
{ "role": "user", "content": "Write a haiku about distributed systems." }
],
"stream": true,
"max_tokens": 128
}'Response
A stream of ChatCompletionChunk messages. Each choices[].delta is a partial ChatMessage; concatenate the delta.content fragments to reconstruct the message. finish_reason is set on the terminal chunk.
{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": { "content": "Nodes" } } ] }
{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": { "content": " drift apart" } } ] }
{ "id": "chatcmpl-0a1b2c3d", "object": "chat.completion.chunk", "model": "kimi-k2.5", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop" } ] }See also
- Using OpenAI & Cohere SDKs — point an OpenAI client at this endpoint
- Model Catalog — chat/instruct models and their ids
- Conventions — transport, auth, streaming