Workflow: Run Model Inference
Last validated: 2026-05-20
Goal
Use Ignite model endpoints for list/get/infer operations, with API fallback for full-control chat/rerank/transcribe flows.
Inputs
- Token
- API endpoint
- Model name
CLI Path (Current)
# List and inspect models
dodil ignite models list
dodil ignite models get kimi-k2-5
# Embeddings
dodil ignite models embed kimi-k2-5 --input "hello world"
# Generic infer
dodil ignite models infer kimi-k2-5 --prompt '{"input":"hello"}'API Fallback for Full Request Control
Chat completion
curl -sS https://api.dev.dodil.io/v1/chat/completions \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"model":"kimi-k2-5",
"messages":[{"role":"user","content":"Summarize this text"}]
}'Rerank
curl -sS https://api.dev.dodil.io/v1/rerank \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"model":"bge-reranker-v2",
"query":"ignite gateway architecture",
"documents":["doc A","doc B"]
}'Transcribe
Use Transcribe API directly for real audio payload control in automation environments.
Current CLI Caveats
- CLI has no first-class rerank command.
models chatandmodels transcribecurrently have implementation ergonomics gaps for production use.- Prefer API calls when request structure fidelity matters (tools, response_format, advanced options).