Skip to Content
We are live but in Staging 🎉
WorkflowsWorkflow: Run Model Inference

Workflow: Run Model Inference

Last validated: 2026-05-20

Goal

Use Ignite model endpoints for list/get/infer operations, with API fallback for full-control chat/rerank/transcribe flows.

Inputs

  • Token
  • API endpoint
  • Model name

CLI Path (Current)

# List and inspect models dodil ignite models list dodil ignite models get kimi-k2-5 # Embeddings dodil ignite models embed kimi-k2-5 --input "hello world" # Generic infer dodil ignite models infer kimi-k2-5 --prompt '{"input":"hello"}'

API Fallback for Full Request Control

Chat completion

curl -sS https://api.dev.dodil.io/v1/chat/completions \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "model":"kimi-k2-5", "messages":[{"role":"user","content":"Summarize this text"}] }'

Rerank

curl -sS https://api.dev.dodil.io/v1/rerank \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "model":"bge-reranker-v2", "query":"ignite gateway architecture", "documents":["doc A","doc B"] }'

Transcribe

Use Transcribe API directly for real audio payload control in automation environments.

Current CLI Caveats

  1. CLI has no first-class rerank command.
  2. models chat and models transcribe currently have implementation ergonomics gaps for production use.
  3. Prefer API calls when request structure fidelity matters (tools, response_format, advanced options).