Image (DataKind)

Use DATA_KIND_IMAGE when your input is an image (PNG/JPEG/WebP, etc.).

In this mode, the pipeline treats the input as visual content and proceeds to image chunking + embedding. This is different from DATA_KIND_FILE, where the system assumes a document container (PDF/DOCX) that may require text extraction.

What this resembles

Choosing Image basically means:

“This input is an image. Please embed what’s visible in it, and optionally split it into tiles if needed.”

Common examples:

Product screenshots
Diagrams and architecture images
Scanned receipts or forms (as images)
Photos used for visual search

How to send Image inputs

Image inputs typically use one of these locators:

UrlLocator (image downloadable via HTTP/HTTPS)
S3Locator (image stored in object storage)

And (optionally) set:

Input.kind_hint = DATA_KIND_IMAGE

Why the hint helps:

Faster routing (no type detection)
More predictable behavior

If you are sending bytes, setting either kind_hint or Meta.mime_type (e.g. image/png) is strongly recommended.

Image chunking technique

Chunking for images decides whether the system embeds:

the whole image as one unit, or
tiles (smaller cropped regions) to improve retrieval for large images.

This is controlled by the image chunk policy.

Supported chunking techniques

Whole (`IMAGE_CHUNK_TECHNIQUE_WHOLE`)

Embeds the image as a single chunk.

Use Whole when:

the image is already reasonably sized
you expect search to retrieve the full image
you want fewer chunks (lower cost, simpler results)

What you get:

one embedded chunk
a span that covers the full image area (a “rect span”)

Tile (`IMAGE_CHUNK_TECHNIQUE_TILE`)

Splits the image into square tiles and embeds each tile.

Use Tile when:

the image is large (e.g., big screenshots, dense diagrams)
the important detail may be localized (a small UI element, a small section of a diagram)
you want retrieval to return the most relevant region instead of the entire image

Key options:

tile_px: tile size in pixels (required)
overlap_px: how many pixels overlap between tiles (optional)

What you get:

multiple embedded chunks (one per tile)
each chunk includes a rect span that points to the tile location inside the original image

What you get after chunking

Each produced image chunk includes:

the image data for that chunk (whole image or tile)
a reference back to where it came from in the original image (a rect span)
metadata you can use for filtering and display (name, source id, mime type)

This allows you to retrieve a matching tile from vector search and still know exactly where it belongs inside the original image.

Practical guidance

Recommended defaults

For general knowledge base ingestion:

Start with Whole for most images
Use Tile when images are very large or detail-heavy
If using Tile:
- choose a tile_px that captures meaningful regions
- use a small overlap_px if boundaries often cut through important details

Image vs File

Use Image when:

the input is a true image (PNG/JPEG/WebP)
you want visual chunking (whole/tiles)

Use File when:

the input is a document container (PDF/DOCX/PPTX)
you want text extraction + text chunking

Mental model

DataKind answers: “What is this input?” → image
Chunking answers: “How do we split it?” → whole image or tiles
Rect spans answer: “Where did this chunk come from?” → an (x, y, width, height) region in the original