Transcription — API Reference

Package: dodil.ignite.v1 · Service: ModelService

Transcribe speech to text. The HTTP surface is OpenAI-compatible: the path is the top-level /v1/audio/transcriptions (not under /v1/ignite/), and the JSON body matches the OpenAI Audio Transcriptions request exactly — language, response_format, and the rest stay snake_case on both HTTP and gRPC. Use the OpenAI SDK directly: see Using OpenAI & Cohere SDKs and the Model Catalog.

RPC	HTTP	streaming
`Transcribe`	`POST /v1/audio/transcriptions`	unary

gRPC reaches the method at dodil.ignite.v1.ModelService/Transcribe on $IGNITE_GRPC. See Conventions → Using gRPC for grpcurl setup. Both transports use the same OpenAI-native snake_case JSON body.

`Transcribe`

Request

audio is the raw audio bytes. Over JSON (both the HTTP gateway and grpcurl) bytes fields are base64-encoded, so pass the base64 string. Set language to an ISO 639-1 code (e.g. "en", "nl") or leave it empty for auto-detection. Choose "verbose_json" to receive per-segment timestamps.

HTTP


curl -sS -X POST "https://api.dev.dodil.io/v1/audio/transcriptions" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"whisper-large-v3-turbo\",
    \"audio\": \"$(base64 -i meeting.wav)\",
    \"language\": \"en\",
    \"response_format\": \"verbose_json\"
  }"

Response

HTTP


{
  "text": "Welcome everyone, let's start with the quarterly review.",
  "language": "en",
  "duration": 4.82,
  "segments": [
    { "start": 0.0, "end": 2.1, "text": "Welcome everyone," },
    { "start": 2.1, "end": 4.82, "text": "let's start with the quarterly review." }
  ]
}

segments is populated only when response_format is "verbose_json"; with "json" you get text (and language / duration) alone.

Transcription — API Reference

Transcribe

Request

HTTP

gRPC

Response

HTTP

gRPC

See also

`Transcribe`