Skip to Content
We are live but in Staging 🎉

Image (DataKind)

Use DATA_KIND_IMAGE when your input is an image (PNG/JPEG/WebP, etc.).

In this mode, the pipeline treats the input as visual content and proceeds to image chunking + embedding. This is different from DATA_KIND_FILE, where the system assumes a document container (PDF/DOCX) that may require text extraction.


What this resembles

Choosing Image basically means:

“This input is an image. Please embed what’s visible in it, and optionally split it into tiles if needed.”

Common examples:

  • Product screenshots
  • Diagrams and architecture images
  • Scanned receipts or forms (as images)
  • Photos used for visual search

How to send Image inputs

Image inputs typically use one of these locators:

  • UrlLocator (image downloadable via HTTP/HTTPS)
  • S3Locator (image stored in object storage)

And (optionally) set:

  • Input.kind_hint = DATA_KIND_IMAGE

Why the hint helps:

  • Faster routing (no type detection)
  • More predictable behavior

If you are sending bytes, setting either kind_hint or Meta.mime_type (e.g. image/png) is strongly recommended.


Image chunking technique

Chunking for images decides whether the system embeds:

  • the whole image as one unit, or
  • tiles (smaller cropped regions) to improve retrieval for large images.

This is controlled by the image chunk policy.


Supported chunking techniques

Whole (IMAGE_CHUNK_TECHNIQUE_WHOLE)

Embeds the image as a single chunk.

Use Whole when:

  • the image is already reasonably sized
  • you expect search to retrieve the full image
  • you want fewer chunks (lower cost, simpler results)

What you get:

  • one embedded chunk
  • a span that covers the full image area (a “rect span”)

Tile (IMAGE_CHUNK_TECHNIQUE_TILE)

Splits the image into square tiles and embeds each tile.

Use Tile when:

  • the image is large (e.g., big screenshots, dense diagrams)
  • the important detail may be localized (a small UI element, a small section of a diagram)
  • you want retrieval to return the most relevant region instead of the entire image

Key options:

  • tile_px: tile size in pixels (required)
  • overlap_px: how many pixels overlap between tiles (optional)

What you get:

  • multiple embedded chunks (one per tile)
  • each chunk includes a rect span that points to the tile location inside the original image

What you get after chunking

Each produced image chunk includes:

  • the image data for that chunk (whole image or tile)
  • a reference back to where it came from in the original image (a rect span)
  • metadata you can use for filtering and display (name, source id, mime type)

This allows you to retrieve a matching tile from vector search and still know exactly where it belongs inside the original image.


Practical guidance

For general knowledge base ingestion:

  • Start with Whole for most images
  • Use Tile when images are very large or detail-heavy
  • If using Tile:
    • choose a tile_px that captures meaningful regions
    • use a small overlap_px if boundaries often cut through important details

Image vs File

Use Image when:

  • the input is a true image (PNG/JPEG/WebP)
  • you want visual chunking (whole/tiles)

Use File when:

  • the input is a document container (PDF/DOCX/PPTX)
  • you want text extraction + text chunking

Mental model

  • DataKind answers: “What is this input?” → image
  • Chunking answers: “How do we split it?” → whole image or tiles
  • Rect spans answer: “Where did this chunk come from?” → an (x, y, width, height) region in the original
Last updated on