Skip to Content
We are live but in Staging 🎉

DataKind

DataKind describes the type of input content you are ingesting to support mult-modal embeddings.

It helps the pipeline pick the right extraction + chunking strategy (text parsing vs media decoding), and it improves performance by avoiding expensive detection steps.

You can provide it as an optional hint via Input.kind_hint.


Supported values

DATA_KIND_TEXT

Meaning: Plain text provided directly in the request.

Resembles: “I already have the text, just embed it.”

Typical locator: TextLocator

Examples:

  • Chat transcripts
  • Notes
  • Small JSON/text blobs you want embedded as-is

DATA_KIND_FILE

Meaning: A general file whose content must be extracted before chunking/embedding.

Resembles: “Fetch this file and extract its text/content.”

Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator

Examples:

  • PDF, DOCX
  • HTML files
  • CSV and other document-like formats

Notes:

  • Use this when the input is a document/container format and you expect text extraction.
  • If you already know it’s an image/audio/video, prefer those kinds instead.

DATA_KIND_IMAGE

Meaning: An image input.

Resembles: “Embed what is visible in this image.”

Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator

Examples:

  • PNG/JPEG/WebP
  • Scanned pages (image-based PDFs may still be DATA_KIND_FILE depending on your extractor)

Notes:

  • Image inputs may be processed via tiling (RectSpan) and vision/OCR depending on your pipeline configuration.

DATA_KIND_AUDIO

Meaning: An audio input.

Resembles: “Embed / index this audio content (often via transcription first).”

Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator

Examples:

  • MP3/WAV/M4A
  • Call recordings
  • Podcasts clips

Notes:

  • Audio chunking typically produces TimeSpan references back to the original media.

DATA_KIND_VIDEO

Meaning: A video input.

Resembles: “Embed the video (often via frames + optional audio transcript).”

Typical locators: FileLocator, UrlLocator, S3Locator, BytesLocator

Examples:

  • MP4/MKV
  • Product demos
  • Recorded meetings

Notes:

  • Video chunking can reference frames and/or time ranges using RectSpan and TimeSpan.

DATA_KIND_UNSPECIFIED

Meaning: No hint provided.

When omitted, the system will detect the kind using available signals (e.g., mime type, file extension, magic bytes) and fall back to conservative defaults.


Choosing the right DataKind

Use this quick guide:

  • You have the text in your request → TEXT
  • You have a document (PDF/DOCX/PPTX/HTML/CSV) → FILE
  • You have an image (PNG/JPEG/WebP) → IMAGE
  • You have audio (MP3/WAV/M4A) → AUDIO
  • You have video (MP4/MKV) → VIDEO

Practical tips

  • If you can, set Meta.mime_type (e.g. application/pdf, image/png). It makes detection cheaper and more accurate.
  • If your input is BytesLocator, setting either kind_hint or Meta.mime_type is strongly recommended.
  • DataKind is a hint: the backend may still override it if the content clearly doesn’t match.
Last updated on