Skip to Content
We are live but in Staging 🎉

Locator

A Locator tells VNG where the input content lives and how to load it.

Each Input must include exactly one locator (a oneof), such as a URL, a local file path, an S3 key, raw bytes, or inline text.

Use Locator to point the system at the original data; the pipeline will then inspect it, extract/parse if needed, chunk it, and embed it.

Python SDK usage

In the Python SDK, you can specify inputs in two ways:

  1. Pass strings directly — the SDK will auto-detect and create a VngInput behind the scenes.
  2. Build VngInput explicitly — best when you want full control over locator type, kind hints, and name hints.

Option 1: Pass strings (auto-detection)

The SDK detects common patterns:

  • https://... → UrlLocator
  • s3://... → S3Locator
  • otherwise → TextLocator
# Strings are accepted directly vecs = vng.embed( inputs=[ "Hello from text", # -> TextLocator "https://example.com/report.pdf", # -> UrlLocator "s3://my-bucket/docs/handbook.pdf", # -> S3Locator "/mnt/shared/data/image.png", # -> FileLocator (if path exists) ] )

Option 2: Build VngInput explicitly

Use this when you want to be explicit about the source type.

from dodil.vng_client import VngInput inputs = [ VngInput.as_text("Hello from text"), VngInput.as_url("https://example.com/report.pdf", kind="FILE"), VngInput.as_s3("s3://my-bucket/docs/handbook.pdf", kind="FILE") ] vecs = vng.embed(inputs=inputs)

Name hints for better detection

When you send bytes (or ambiguous inputs), a name hint and/or Meta.mime_type helps the platform detect the correct format.

from dodil.vng_client import VngInput # Note: bytes input support depends on the transport. If your SDK transport does not # support bytes yet, use Url/S3 locators instead. img = VngInput.as_bytes(data=b"...", name_hint="screenshot.png", kind="IMAGE") vecs = vng.embed(inputs=[img])

Supported values

UrlLocator

What it is: A public or private URL to fetch content over HTTP/HTTPS.

Resembles: “Download this file from the web.”

Proto:

  • UrlLocator { url }

Good for:

  • Public files (docs, images, audio/video, etc.)
  • Pre-signed URLs
  • Private URLs when used together with SourceAccess (when supported)

Notes:

  • For private endpoints, you will typically provide credentials/headers via SourceAccess (or use a pre-signed URL).
  • Set Meta.mime_type when possible to avoid extra sniffing.

S3Locator

What it is: A pointer to an object stored in S3 or S3-compatible storage.

Resembles: “Load this object from my bucket.”

Proto:

  • S3Locator { key }

Good for:

  • AWS S3
  • S3-compatible providers (MinIO, Ceph RGW, etc.)

Notes:

  • key is a single string that should match what your backend expects (commonly bucket/path/to/object).
  • For private buckets, provide credentials using SourceAccess.

BytesLocator

What it is: Raw bytes embedded directly in the request.

Resembles: “Here is the file content; ingest it without downloading.”

Proto:

  • BytesLocator { bytes, name_hint? }

Good for:

  • Small payloads
  • Cases where you already have the data in memory
  • Testing / prototyping

Notes:

  • This is not recommended for large files (request size + memory pressure).
  • Provide name_hint (e.g. "report.pdf") and/or Meta.mime_type to improve detection.
  • In some SDK modes/transports, bytes inputs may not be supported yet. If you hit that limitation, use UrlLocator, S3Locator, or FileLocator instead.

TextLocator

What it is: Inline text provided directly in the request.

Resembles: “Embed this text.”

Proto:

  • TextLocator { text }

Good for:

  • Search queries (often paired with EMBED_TASK_QUERY)
  • Short documents or snippets
  • Metadata fields you want indexed as text

Notes:

  • Prefer TextLocator over BytesLocator when your input is already plain text.

Choosing the right Locator

  • You already have plain text → TextLocator
  • You have a file on the same machine/pod → FileLocator
  • You have a downloadable link → UrlLocator
  • You have an object in S3/MinIO → S3Locator
  • You have small content in memory and want to pass it directly → BytesLocator

Practical tips

  • Combine Locator with DataKind (Input.kind_hint) and Meta.mime_type for faster, more reliable ingestion.
  • If the source is private (S3 private bucket, private URL, private GitHub/Drive in future), attach credentials using SourceAccess.
  • The backend may validate that the locator is reachable and return a per-input error if it cannot be loaded.
Last updated on