Model Catalog
Every model below is callable through ModelService. A model’s declared input/output format routes you to the right surface — chat, embeddings, rerank, transcription, or the generic Infer. Model names (the id column) are global — no org prefix — and are exactly what you pass as "model" in a request.
This page is the human-readable view. The same data is available live and unauthenticated from GET /v1/models (dodil ignite models list).
How models are billed
Each model declares a billing mode, surfaced as billing_mode on GetModel:
-
token— billed per token. Usage is reported asprompt_tokensandcompletion_tokens, and each is weighted by the model’s cost multiplier to produce billable cost units:prompt cost units = prompt_tokens × prompt_cost_multiplier completion cost units = completion_tokens × completion_cost_multiplierA multiplier of
1.0means one cost unit per token; a higher multiplier marks a more expensive model. Kimi K2.5’s15.6 / 8.125, for example, bills each prompt token as 15.6 units and each completion token as 8.125. The multiplier is a relative cost factor — the price per unit is set by your plan, not by the model. -
cpu_time— billed by the compute a request consumes, independent of token count. Two quantities are metered: vCPU-seconds and GB-seconds of the request’s execution. Longer or heavier inputs cost more.
In short: token models scale with text volume × multiplier; cpu_time models scale with how long inference runs. The tables below show each model’s mode, and the prompt/completion multipliers where it is token-billed.
Chat & generation
OpenAI-compatible chat completions — POST /v1/chat/completions. Multimodal-in / text-out where noted.
| Model | What it does | Provider | Context window | Billing |
|---|---|---|---|---|
kimi-k2.5 | Flagship chat with reasoning and vision (image + text in), extended thinking | Moonshot AI | 1,048,576 | token · ×15.6 / ×8.125 |
kimi-k2 | Text chat with step-by-step reasoning | Moonshot AI | 131,072 | token · ×15.6 / ×8.125 |
moonshot-v1-auto | Auto-routing chat — picks the right context window for the input | Moonshot AI | 131,072 | token · ×15.6 / ×8.125 |
Embeddings
OpenAI-compatible embeddings — POST /v1/embeddings.
| Model | What it does | Provider | Dimensions | Billing |
|---|---|---|---|---|
jina-embeddings-v4 | Multimodal embeddings (text, code, image), task-aware, Matryoshka truncation | Jina AI | 2048 (truncatable) | token · ×1.0 |
arctic-embed-m-v2 | Text embeddings for semantic search / RAG (English) | Snowflake | 768 | cpu_time |
embeddinggemma-300m | Lightweight general-purpose text embeddings | 768 | cpu_time | |
arcface-r100 | Face embeddings for identity verification and clustering (image in) | InsightFace | 512 | cpu_time |
Rerank
Cohere-compatible reranking — POST /v1/rerank. Returns relevance scores in [0.0, 1.0].
| Model | What it does | Provider | Context window | Billing |
|---|---|---|---|---|
jina-reranker-v2 | Cross-encoder reranking of documents against a query, 100+ languages | Jina AI | 32,768 | token · ×1.0 |
Transcription
OpenAI-compatible speech-to-text — POST /v1/audio/transcriptions.
| Model | What it does | Provider | Languages | Billing |
|---|---|---|---|---|
whisper-large-v3-turbo | Multilingual transcription with per-segment timestamps | OpenAI | 99 | token · ×1.0 |
Infer (generic)
Models without a standard OpenAI/Cohere shape are reached through the generic POST /v1/infer surface — classification, detection, OCR, NER, and translation. Discover each one’s exact request/response schema with GetModel (input.json_schema / output.json_schema).
| Model | What it does | Provider | Key spec | Billing |
|---|---|---|---|---|
seed-x-ppo-7b | Multilingual translation, 28 languages bidirectional | ByteDance | 7B params | token · ×1.0 |
mm-gdino-large | Open-vocabulary object detection — pass text labels at call time | OpenMMLab | 341M params | token · ×1.0 |
paddleocr-vl-gpu | End-to-end document OCR — text, tables, formulas, layout | Baidu / PaddlePaddle | ~1B params | token · ×1.0 |
gliner-multi-v2.1 | Zero-shot NER — define entity types at call time, 100+ languages | Urchade Zaratiana | 278M params | cpu_time |
distilbert-sst2 | Sentiment classification (positive / negative) | HuggingFace | 66M params | cpu_time |
toxic-bert | Multi-label toxicity detection (6 categories) | Unitary | 66M params | cpu_time |
clip-vit-b32 | Zero-shot image classification via image/text encoders | OpenAI | 151M params | cpu_time |
mobilenet-v3-large | Image classification, 1000 ImageNet classes | 5.4M params | cpu_time | |
yamnet | Audio event classification, 521 sound categories | 3.7M params | cpu_time | |
yolov8m | Object detection, 80 COCO classes | Ultralytics | 25.9M params | cpu_time |
scrfd-10g | Face detection with 5-point landmarks | InsightFace | 16M params | cpu_time |
scrfd-34g | Higher-accuracy face detection with landmarks | InsightFace | 9.8M params | cpu_time |
pp-doclayout-s | Document layout detection (text blocks, tables, figures) | PaddlePaddle | 4M params | cpu_time |
paddleocr-v5-mobile | Two-stage OCR (detect + recognize), 50+ languages | Baidu / PaddlePaddle | 21M params | cpu_time |
See also
- API Reference — every ModelService RPC, with OpenAI/Cohere-style HTTP paths
- Using OpenAI & Cohere SDKs — point the official SDKs at Ignite
- Models overview