Image (DataKind)
Use DATA_KIND_IMAGE when your input is an image (PNG/JPEG/WebP, etc.).
In this mode, the pipeline treats the input as visual content and proceeds to image chunking + embedding. This is different from DATA_KIND_FILE, where the system assumes a document container (PDF/DOCX) that may require text extraction.
What this resembles
Choosing Image basically means:
“This input is an image. Please embed what’s visible in it, and optionally split it into tiles if needed.”
Common examples:
- Product screenshots
- Diagrams and architecture images
- Scanned receipts or forms (as images)
- Photos used for visual search
How to send Image inputs
Image inputs typically use one of these locators:
UrlLocator(image downloadable via HTTP/HTTPS)S3Locator(image stored in object storage)
And (optionally) set:
Input.kind_hint = DATA_KIND_IMAGE
Why the hint helps:
- Faster routing (no type detection)
- More predictable behavior
If you are sending bytes, setting either kind_hint or Meta.mime_type (e.g. image/png) is strongly recommended.
Image chunking technique
Chunking for images decides whether the system embeds:
- the whole image as one unit, or
- tiles (smaller cropped regions) to improve retrieval for large images.
This is controlled by the image chunk policy.
Supported chunking techniques
Whole (IMAGE_CHUNK_TECHNIQUE_WHOLE)
Embeds the image as a single chunk.
Use Whole when:
- the image is already reasonably sized
- you expect search to retrieve the full image
- you want fewer chunks (lower cost, simpler results)
What you get:
- one embedded chunk
- a span that covers the full image area (a “rect span”)
Tile (IMAGE_CHUNK_TECHNIQUE_TILE)
Splits the image into square tiles and embeds each tile.
Use Tile when:
- the image is large (e.g., big screenshots, dense diagrams)
- the important detail may be localized (a small UI element, a small section of a diagram)
- you want retrieval to return the most relevant region instead of the entire image
Key options:
tile_px: tile size in pixels (required)overlap_px: how many pixels overlap between tiles (optional)
What you get:
- multiple embedded chunks (one per tile)
- each chunk includes a rect span that points to the tile location inside the original image
What you get after chunking
Each produced image chunk includes:
- the image data for that chunk (whole image or tile)
- a reference back to where it came from in the original image (a rect span)
- metadata you can use for filtering and display (name, source id, mime type)
This allows you to retrieve a matching tile from vector search and still know exactly where it belongs inside the original image.
Practical guidance
Recommended defaults
For general knowledge base ingestion:
- Start with Whole for most images
- Use Tile when images are very large or detail-heavy
- If using Tile:
- choose a
tile_pxthat captures meaningful regions - use a small
overlap_pxif boundaries often cut through important details
- choose a
Image vs File
Use Image when:
- the input is a true image (PNG/JPEG/WebP)
- you want visual chunking (whole/tiles)
Use File when:
- the input is a document container (PDF/DOCX/PPTX)
- you want text extraction + text chunking
Mental model
- DataKind answers: “What is this input?” → image
- Chunking answers: “How do we split it?” → whole image or tiles
- Rect spans answer: “Where did this chunk come from?” → an (x, y, width, height) region in the original