Model Catalog

Every model below is callable through ModelService. A model’s declared input/output format routes you to the right surface — chat, embeddings, rerank, transcription, or the generic Infer. Model names (the id column) are global — no org prefix — and are exactly what you pass as "model" in a request.

This page is the human-readable view. The same data is available live and unauthenticated from GET /v1/models (dodil ignite models list).

How models are billed

Each model declares a billing mode, surfaced as billing_mode on GetModel:

token — billed per token. Usage is reported as prompt_tokens and completion_tokens, and each is weighted by the model’s cost multiplier to produce billable cost units:
```
prompt cost units     = prompt_tokens     × prompt_cost_multiplier
completion cost units = completion_tokens × completion_cost_multiplier
```
A multiplier of 1.0 means one cost unit per token; a higher multiplier marks a more expensive model. Kimi K2.5’s 15.6 / 8.125, for example, bills each prompt token as 15.6 units and each completion token as 8.125. The multiplier is a relative cost factor — the price per unit is set by your plan, not by the model.
cpu_time — billed by the compute a request consumes, independent of token count. Two quantities are metered: vCPU-seconds and GB-seconds of the request’s execution. Longer or heavier inputs cost more.

In short: token models scale with text volume × multiplier; cpu_time models scale with how long inference runs. The tables below show each model’s mode, and the prompt/completion multipliers where it is token-billed.

Chat & generation

OpenAI-compatible chat completions — POST /v1/chat/completions. Multimodal-in / text-out where noted.

Model	What it does	Provider	Context window	Billing
`kimi-k2.5`	Flagship chat with reasoning and vision (image + text in), extended thinking	Moonshot AI	1,048,576	`token` · ×15.6 / ×8.125
`kimi-k2`	Text chat with step-by-step reasoning	Moonshot AI	131,072	`token` · ×15.6 / ×8.125
`moonshot-v1-auto`	Auto-routing chat — picks the right context window for the input	Moonshot AI	131,072	`token` · ×15.6 / ×8.125

Embeddings

OpenAI-compatible embeddings — POST /v1/embeddings.

Model	What it does	Provider	Dimensions	Billing
`jina-embeddings-v4`	Multimodal embeddings (text, code, image), task-aware, Matryoshka truncation	Jina AI	2048 (truncatable)	`token` · ×1.0
`arctic-embed-m-v2`	Text embeddings for semantic search / RAG (English)	Snowflake	768	`cpu_time`
`embeddinggemma-300m`	Lightweight general-purpose text embeddings	Google	768	`cpu_time`
`arcface-r100`	Face embeddings for identity verification and clustering (image in)	InsightFace	512	`cpu_time`

Rerank

Cohere-compatible reranking — POST /v1/rerank. Returns relevance scores in [0.0, 1.0].

Model	What it does	Provider	Context window	Billing
`jina-reranker-v2`	Cross-encoder reranking of documents against a query, 100+ languages	Jina AI	32,768	`token` · ×1.0

Transcription

OpenAI-compatible speech-to-text — POST /v1/audio/transcriptions.

Model	What it does	Provider	Languages	Billing
`whisper-large-v3-turbo`	Multilingual transcription with per-segment timestamps	OpenAI	99	`token` · ×1.0

Infer (generic)

Models without a standard OpenAI/Cohere shape are reached through the generic POST /v1/infer surface — classification, detection, OCR, NER, and translation. Discover each one’s exact request/response schema with GetModel (input.json_schema / output.json_schema).

Model	What it does	Provider	Key spec	Billing
`seed-x-ppo-7b`	Multilingual translation, 28 languages bidirectional	ByteDance	7B params	`token` · ×1.0
`mm-gdino-large`	Open-vocabulary object detection — pass text labels at call time	OpenMMLab	341M params	`token` · ×1.0
`paddleocr-vl-gpu`	End-to-end document OCR — text, tables, formulas, layout	Baidu / PaddlePaddle	~1B params	`token` · ×1.0
`gliner-multi-v2.1`	Zero-shot NER — define entity types at call time, 100+ languages	Urchade Zaratiana	278M params	`cpu_time`
`distilbert-sst2`	Sentiment classification (positive / negative)	HuggingFace	66M params	`cpu_time`
`toxic-bert`	Multi-label toxicity detection (6 categories)	Unitary	66M params	`cpu_time`
`clip-vit-b32`	Zero-shot image classification via image/text encoders	OpenAI	151M params	`cpu_time`
`mobilenet-v3-large`	Image classification, 1000 ImageNet classes	Google	5.4M params	`cpu_time`
`yamnet`	Audio event classification, 521 sound categories	Google	3.7M params	`cpu_time`
`yolov8m`	Object detection, 80 COCO classes	Ultralytics	25.9M params	`cpu_time`
`scrfd-10g`	Face detection with 5-point landmarks	InsightFace	16M params	`cpu_time`
`scrfd-34g`	Higher-accuracy face detection with landmarks	InsightFace	9.8M params	`cpu_time`
`pp-doclayout-s`	Document layout detection (text blocks, tables, figures)	PaddlePaddle	4M params	`cpu_time`
`paddleocr-v5-mobile`	Two-stage OCR (detect + recognize), 50+ languages	Baidu / PaddlePaddle	21M params	`cpu_time`