Transcription — API Reference
Package: dodil.ignite.v1 · Service: ModelService
Transcribe speech to text. The HTTP surface is OpenAI-compatible: the path is the top-level /v1/audio/transcriptions (not under /v1/ignite/), and the JSON body matches the OpenAI Audio Transcriptions request exactly — language, response_format, and the rest stay snake_case on both HTTP and gRPC. Use the OpenAI SDK directly: see Using OpenAI & Cohere SDKs and the Model Catalog.
| RPC | HTTP | streaming |
|---|---|---|
Transcribe | POST /v1/audio/transcriptions | unary |
gRPC reaches the method at dodil.ignite.v1.ModelService/Transcribe on $IGNITE_GRPC. See Conventions → Using gRPC for grpcurl setup. Both transports use the same OpenAI-native snake_case JSON body.
Transcribe
Request
audio is the raw audio bytes. Over JSON (both the HTTP gateway and grpcurl) bytes fields are base64-encoded, so pass the base64 string. Set language to an ISO 639-1 code (e.g. "en", "nl") or leave it empty for auto-detection. Choose "verbose_json" to receive per-segment timestamps.
HTTP
curl -sS -X POST "https://api.dev.dodil.io/v1/audio/transcriptions" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"whisper-large-v3-turbo\",
\"audio\": \"$(base64 -i meeting.wav)\",
\"language\": \"en\",
\"response_format\": \"verbose_json\"
}"Response
HTTP
{
"text": "Welcome everyone, let's start with the quarterly review.",
"language": "en",
"duration": 4.82,
"segments": [
{ "start": 0.0, "end": 2.1, "text": "Welcome everyone," },
{ "start": 2.1, "end": 4.82, "text": "let's start with the quarterly review." }
]
}segments is populated only when response_format is "verbose_json"; with "json" you get text (and language / duration) alone.
See also
- Using OpenAI & Cohere SDKs — point an OpenAI client at this endpoint
- Model Catalog — transcription models and supported languages
- Conventions — transport, auth, wire format