Skip to Content
We are live but in Staging 🎉
ModelsModel Catalog

Model Catalog

Every model below is callable through ModelService. A model’s declared input/output format routes you to the right surface — chat, embeddings, rerank, transcription, or the generic Infer. Model names (the id column) are global — no org prefix — and are exactly what you pass as "model" in a request.

This page is the human-readable view. The same data is available live and unauthenticated from GET /v1/models (dodil ignite models list).

How models are billed

Each model declares a billing mode, surfaced as billing_mode on GetModel:

  • token — billed per token. Usage is reported as prompt_tokens and completion_tokens, and each is weighted by the model’s cost multiplier to produce billable cost units:

    prompt cost units = prompt_tokens × prompt_cost_multiplier completion cost units = completion_tokens × completion_cost_multiplier

    A multiplier of 1.0 means one cost unit per token; a higher multiplier marks a more expensive model. Kimi K2.5’s 15.6 / 8.125, for example, bills each prompt token as 15.6 units and each completion token as 8.125. The multiplier is a relative cost factor — the price per unit is set by your plan, not by the model.

  • cpu_time — billed by the compute a request consumes, independent of token count. Two quantities are metered: vCPU-seconds and GB-seconds of the request’s execution. Longer or heavier inputs cost more.

In short: token models scale with text volume × multiplier; cpu_time models scale with how long inference runs. The tables below show each model’s mode, and the prompt/completion multipliers where it is token-billed.


Chat & generation

OpenAI-compatible chat completions — POST /v1/chat/completions. Multimodal-in / text-out where noted.

ModelWhat it doesProviderContext windowBilling
kimi-k2.5Flagship chat with reasoning and vision (image + text in), extended thinkingMoonshot AI1,048,576token · ×15.6 / ×8.125
kimi-k2Text chat with step-by-step reasoningMoonshot AI131,072token · ×15.6 / ×8.125
moonshot-v1-autoAuto-routing chat — picks the right context window for the inputMoonshot AI131,072token · ×15.6 / ×8.125

Embeddings

OpenAI-compatible embeddings — POST /v1/embeddings.

ModelWhat it doesProviderDimensionsBilling
jina-embeddings-v4Multimodal embeddings (text, code, image), task-aware, Matryoshka truncationJina AI2048 (truncatable)token · ×1.0
arctic-embed-m-v2Text embeddings for semantic search / RAG (English)Snowflake768cpu_time
embeddinggemma-300mLightweight general-purpose text embeddingsGoogle768cpu_time
arcface-r100Face embeddings for identity verification and clustering (image in)InsightFace512cpu_time

Rerank

Cohere-compatible reranking — POST /v1/rerank. Returns relevance scores in [0.0, 1.0].

ModelWhat it doesProviderContext windowBilling
jina-reranker-v2Cross-encoder reranking of documents against a query, 100+ languagesJina AI32,768token · ×1.0

Transcription

OpenAI-compatible speech-to-text — POST /v1/audio/transcriptions.

ModelWhat it doesProviderLanguagesBilling
whisper-large-v3-turboMultilingual transcription with per-segment timestampsOpenAI99token · ×1.0

Infer (generic)

Models without a standard OpenAI/Cohere shape are reached through the generic POST /v1/infer surface — classification, detection, OCR, NER, and translation. Discover each one’s exact request/response schema with GetModel (input.json_schema / output.json_schema).

ModelWhat it doesProviderKey specBilling
seed-x-ppo-7bMultilingual translation, 28 languages bidirectionalByteDance7B paramstoken · ×1.0
mm-gdino-largeOpen-vocabulary object detection — pass text labels at call timeOpenMMLab341M paramstoken · ×1.0
paddleocr-vl-gpuEnd-to-end document OCR — text, tables, formulas, layoutBaidu / PaddlePaddle~1B paramstoken · ×1.0
gliner-multi-v2.1Zero-shot NER — define entity types at call time, 100+ languagesUrchade Zaratiana278M paramscpu_time
distilbert-sst2Sentiment classification (positive / negative)HuggingFace66M paramscpu_time
toxic-bertMulti-label toxicity detection (6 categories)Unitary66M paramscpu_time
clip-vit-b32Zero-shot image classification via image/text encodersOpenAI151M paramscpu_time
mobilenet-v3-largeImage classification, 1000 ImageNet classesGoogle5.4M paramscpu_time
yamnetAudio event classification, 521 sound categoriesGoogle3.7M paramscpu_time
yolov8mObject detection, 80 COCO classesUltralytics25.9M paramscpu_time
scrfd-10gFace detection with 5-point landmarksInsightFace16M paramscpu_time
scrfd-34gHigher-accuracy face detection with landmarksInsightFace9.8M paramscpu_time
pp-doclayout-sDocument layout detection (text blocks, tables, figures)PaddlePaddle4M paramscpu_time
paddleocr-v5-mobileTwo-stage OCR (detect + recognize), 50+ languagesBaidu / PaddlePaddle21M paramscpu_time

See also