DataKind
DataKind describes the type of input content you are ingesting to support mult-modal embeddings.
It helps the pipeline pick the right extraction + chunking strategy (text parsing vs media decoding), and it improves performance by avoiding expensive detection steps.
You can provide it as an optional hint via Input.kind_hint.
Supported values
DATA_KIND_TEXT
Meaning: Plain text provided directly in the request.
Resembles: “I already have the text, just embed it.”
Typical locator: TextLocator
Examples:
- Chat transcripts
- Notes
- Small JSON/text blobs you want embedded as-is
DATA_KIND_FILE
Meaning: A general file whose content must be extracted before chunking/embedding.
Resembles: “Fetch this file and extract its text/content.”
Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator
Examples:
- PDF, DOCX
- HTML files
- CSV and other document-like formats
Notes:
- Use this when the input is a document/container format and you expect text extraction.
- If you already know it’s an image/audio/video, prefer those kinds instead.
DATA_KIND_IMAGE
Meaning: An image input.
Resembles: “Embed what is visible in this image.”
Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator
Examples:
- PNG/JPEG/WebP
- Scanned pages (image-based PDFs may still be
DATA_KIND_FILEdepending on your extractor)
Notes:
- Image inputs may be processed via tiling (
RectSpan) and vision/OCR depending on your pipeline configuration.
DATA_KIND_AUDIO
Meaning: An audio input.
Resembles: “Embed / index this audio content (often via transcription first).”
Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator
Examples:
- MP3/WAV/M4A
- Call recordings
- Podcasts clips
Notes:
- Audio chunking typically produces
TimeSpanreferences back to the original media.
DATA_KIND_VIDEO
Meaning: A video input.
Resembles: “Embed the video (often via frames + optional audio transcript).”
Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator
Examples:
- MP4/MKV
- Product demos
- Recorded meetings
Notes:
- Video chunking can reference frames and/or time ranges using
RectSpanandTimeSpan.
DATA_KIND_UNSPECIFIED
Meaning: No hint provided.
When omitted, the system will detect the kind using available signals (e.g., mime type, file extension, magic bytes) and fall back to conservative defaults.
Choosing the right DataKind
Use this quick guide:
- You have the text in your request → TEXT
- You have a document (PDF/DOCX/PPTX/HTML/CSV) → FILE
- You have an image (PNG/JPEG/WebP) → IMAGE
- You have audio (MP3/WAV/M4A) → AUDIO
- You have video (MP4/MKV) → VIDEO
Practical tips
- If you can, set
Meta.mime_type(e.g.application/pdf,image/png). It makes detection cheaper and more accurate. - If your input is
BytesLocator, setting eitherkind_hintorMeta.mime_typeis strongly recommended. DataKindis a hint: the backend may still override it if the content clearly doesn’t match.