Skip to Content
We are live but in Staging 🎉
ModelsAPI ReferenceTranscription

Transcription — API Reference

Package: dodil.ignite.v1 · Service: ModelService

Transcribe speech to text. The HTTP surface is OpenAI-compatible: the path is the top-level /v1/audio/transcriptions (not under /v1/ignite/), and the JSON body matches the OpenAI Audio Transcriptions request exactlylanguage, response_format, and the rest stay snake_case on both HTTP and gRPC. Use the OpenAI SDK directly: see Using OpenAI & Cohere SDKs and the Model Catalog.

RPCHTTPstreaming
TranscribePOST /v1/audio/transcriptionsunary

gRPC reaches the method at dodil.ignite.v1.ModelService/Transcribe on $IGNITE_GRPC. See Conventions → Using gRPC for grpcurl setup. Both transports use the same OpenAI-native snake_case JSON body.

Transcribe

Request

audio is the raw audio bytes. Over JSON (both the HTTP gateway and grpcurl) bytes fields are base64-encoded, so pass the base64 string. Set language to an ISO 639-1 code (e.g. "en", "nl") or leave it empty for auto-detection. Choose "verbose_json" to receive per-segment timestamps.

curl -sS -X POST "https://api.dev.dodil.io/v1/audio/transcriptions" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d "{ \"model\": \"whisper-large-v3-turbo\", \"audio\": \"$(base64 -i meeting.wav)\", \"language\": \"en\", \"response_format\": \"verbose_json\" }"

Response

{ "text": "Welcome everyone, let's start with the quarterly review.", "language": "en", "duration": 4.82, "segments": [ { "start": 0.0, "end": 2.1, "text": "Welcome everyone," }, { "start": 2.1, "end": 4.82, "text": "let's start with the quarterly review." } ] }

segments is populated only when response_format is "verbose_json"; with "json" you get text (and language / duration) alone.


See also