Using OpenAI & Cohere SDKs
Ignite’s Model API speaks the OpenAI and Cohere wire formats verbatim — same paths, same JSON, same response shapes. So you don’t need a Dodil SDK: point the official OpenAI or Cohere client at Ignite, use a Dodil token as the API key, and pass a catalog model id as the model name.
Two settings are all it takes:
| Setting | Value |
|---|---|
| Base URL | https://api.dev.dodil.io/v1 (staging) · https://api.dodil.io/v1 (production) |
| API key | your Dodil token ($DODIL_TOKEN) |
Cohere’s client takes the host without the
/v1suffix — it appends/v1/rerankitself. See the rerank example below.
What maps to what
| Surface | OpenAI / Cohere call | Ignite path | Models |
|---|---|---|---|
| Chat | client.chat.completions.create(...) | /v1/chat/completions | kimi-k2.5, kimi-k2, moonshot-v1-auto |
| Embeddings | client.embeddings.create(...) | /v1/embeddings | jina-embeddings-v4, arctic-embed-m-v2, … |
| Transcription | client.audio.transcriptions.create(...) | /v1/audio/transcriptions | whisper-large-v3-turbo |
| Rerank | cohere.rerank(...) | /v1/rerank | jina-reranker-v2 |
Models without a standard SDK shape (classification, detection, OCR, NER) use the generic Infer surface — call it over plain HTTP or gRPC; there’s no OpenAI/Cohere SDK method for them.
Chat completions
OpenAI Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.dev.dodil.io/v1",
api_key="<your-dodil-token>",
)
resp = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Explain serverless inference in one sentence."}],
max_tokens=200,
)
print(resp.choices[0].message.content)Streaming (stream=True), tool calling, and response_format work as they do against OpenAI — see Chat Completions for the full field set.
Embeddings
OpenAI Python
resp = client.embeddings.create(
model="jina-embeddings-v4",
input=["first document", "second document"],
)
vectors = [d.embedding for d in resp.data]Ignite-specific options like dimensions (Matryoshka truncation) and task_type are accepted as extra fields — see Embeddings.
Transcription
OpenAI Python
with open("audio.mp3", "rb") as f:
resp = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=f,
)
print(resp.text)Rerank (Cohere)
The Cohere client points at the host without /v1:
import cohere
co = cohere.Client(
api_key="<your-dodil-token>",
base_url="https://api.dev.dodil.io",
)
resp = co.rerank(
model="jina-reranker-v2",
query="What is serverless inference?",
documents=[
"Serverless runs code without managing servers.",
"A rerank model scores documents against a query.",
"Embeddings map text to vectors.",
],
top_n=2,
)
for r in resp.results:
print(r.index, r.relevance_score)See Rerank for the full request/response shape.
See also
- Model Catalog — model
ids, specs, and billing - API Reference — the raw HTTP and gRPC surface
- Conventions — endpoints, auth,
google.protobuf.Valuepolymorphic fields