Skip to Content
We are live but in Staging 🎉
ModelsOpenAI & Cohere SDKs

Using OpenAI & Cohere SDKs

Ignite’s Model API speaks the OpenAI and Cohere wire formats verbatim — same paths, same JSON, same response shapes. So you don’t need a Dodil SDK: point the official OpenAI or Cohere client at Ignite, use a Dodil token as the API key, and pass a catalog model id as the model name.

Two settings are all it takes:

SettingValue
Base URLhttps://api.dev.dodil.io/v1 (staging) · https://api.dodil.io/v1 (production)
API keyyour Dodil token ($DODIL_TOKEN)

Cohere’s client takes the host without the /v1 suffix — it appends /v1/rerank itself. See the rerank example below.

What maps to what

SurfaceOpenAI / Cohere callIgnite pathModels
Chatclient.chat.completions.create(...)/v1/chat/completionskimi-k2.5, kimi-k2, moonshot-v1-auto
Embeddingsclient.embeddings.create(...)/v1/embeddingsjina-embeddings-v4, arctic-embed-m-v2, …
Transcriptionclient.audio.transcriptions.create(...)/v1/audio/transcriptionswhisper-large-v3-turbo
Rerankcohere.rerank(...)/v1/rerankjina-reranker-v2

Models without a standard SDK shape (classification, detection, OCR, NER) use the generic Infer surface — call it over plain HTTP or gRPC; there’s no OpenAI/Cohere SDK method for them.

Chat completions

from openai import OpenAI client = OpenAI( base_url="https://api.dev.dodil.io/v1", api_key="<your-dodil-token>", ) resp = client.chat.completions.create( model="kimi-k2.5", messages=[{"role": "user", "content": "Explain serverless inference in one sentence."}], max_tokens=200, ) print(resp.choices[0].message.content)

Streaming (stream=True), tool calling, and response_format work as they do against OpenAI — see Chat Completions for the full field set.

Embeddings

resp = client.embeddings.create( model="jina-embeddings-v4", input=["first document", "second document"], ) vectors = [d.embedding for d in resp.data]

Ignite-specific options like dimensions (Matryoshka truncation) and task_type are accepted as extra fields — see Embeddings.

Transcription

with open("audio.mp3", "rb") as f: resp = client.audio.transcriptions.create( model="whisper-large-v3-turbo", file=f, ) print(resp.text)

Rerank (Cohere)

The Cohere client points at the host without /v1:

import cohere co = cohere.Client( api_key="<your-dodil-token>", base_url="https://api.dev.dodil.io", ) resp = co.rerank( model="jina-reranker-v2", query="What is serverless inference?", documents=[ "Serverless runs code without managing servers.", "A rerank model scores documents against a query.", "Embeddings map text to vectors.", ], top_n=2, ) for r in resp.results: print(r.index, r.relevance_score)

See Rerank for the full request/response shape.


See also