Locator
A Locator tells VNG where the input content lives and how to load it.
Each Input must include exactly one locator (a oneof), such as a URL, a local file path, an S3 key, raw bytes, or inline text.
Use Locator to point the system at the original data; the pipeline will then inspect it, extract/parse if needed, chunk it, and embed it.
Python SDK usage
In the Python SDK, you can specify inputs in two ways:
- Pass strings directly — the SDK will auto-detect and create a
VngInputbehind the scenes. - Build
VngInputexplicitly — best when you want full control over locator type, kind hints, and name hints.
Option 1: Pass strings (auto-detection)
The SDK detects common patterns:
https://...→UrlLocators3://...→S3Locator- otherwise →
TextLocator
# Strings are accepted directly
vecs = vng.embed(
inputs=[
"Hello from text", # -> TextLocator
"https://example.com/report.pdf", # -> UrlLocator
"s3://my-bucket/docs/handbook.pdf", # -> S3Locator
"/mnt/shared/data/image.png", # -> FileLocator (if path exists)
]
)Option 2: Build VngInput explicitly
Use this when you want to be explicit about the source type.
from dodil.vng_client import VngInput
inputs = [
VngInput.as_text("Hello from text"),
VngInput.as_url("https://example.com/report.pdf", kind="FILE"),
VngInput.as_s3("s3://my-bucket/docs/handbook.pdf", kind="FILE")
]
vecs = vng.embed(inputs=inputs)Name hints for better detection
When you send bytes (or ambiguous inputs), a name hint and/or Meta.mime_type helps the platform detect the correct format.
from dodil.vng_client import VngInput
# Note: bytes input support depends on the transport. If your SDK transport does not
# support bytes yet, use Url/S3 locators instead.
img = VngInput.as_bytes(data=b"...", name_hint="screenshot.png", kind="IMAGE")
vecs = vng.embed(inputs=[img])Supported values
UrlLocator
What it is: A public or private URL to fetch content over HTTP/HTTPS.
Resembles: “Download this file from the web.”
Proto:
UrlLocator { url }
Good for:
- Public files (docs, images, audio/video, etc.)
- Pre-signed URLs
- Private URLs when used together with
SourceAccess(when supported)
Notes:
- For private endpoints, you will typically provide credentials/headers via
SourceAccess(or use a pre-signed URL). - Set
Meta.mime_typewhen possible to avoid extra sniffing.
S3Locator
What it is: A pointer to an object stored in S3 or S3-compatible storage.
Resembles: “Load this object from my bucket.”
Proto:
S3Locator { key }
Good for:
- AWS S3
- S3-compatible providers (MinIO, Ceph RGW, etc.)
Notes:
keyis a single string that should match what your backend expects (commonlybucket/path/to/object).- For private buckets, provide credentials using
SourceAccess.
BytesLocator
What it is: Raw bytes embedded directly in the request.
Resembles: “Here is the file content; ingest it without downloading.”
Proto:
BytesLocator { bytes, name_hint? }
Good for:
- Small payloads
- Cases where you already have the data in memory
- Testing / prototyping
Notes:
- This is not recommended for large files (request size + memory pressure).
- Provide
name_hint(e.g."report.pdf") and/orMeta.mime_typeto improve detection. - In some SDK modes/transports, bytes inputs may not be supported yet. If you hit that limitation, use
UrlLocator,S3Locator, orFileLocatorinstead.
TextLocator
What it is: Inline text provided directly in the request.
Resembles: “Embed this text.”
Proto:
TextLocator { text }
Good for:
- Search queries (often paired with
EMBED_TASK_QUERY) - Short documents or snippets
- Metadata fields you want indexed as text
Notes:
- Prefer
TextLocatoroverBytesLocatorwhen your input is already plain text.
Choosing the right Locator
- You already have plain text → TextLocator
- You have a file on the same machine/pod → FileLocator
- You have a downloadable link → UrlLocator
- You have an object in S3/MinIO → S3Locator
- You have small content in memory and want to pass it directly → BytesLocator
Practical tips
- Combine
LocatorwithDataKind(Input.kind_hint) andMeta.mime_typefor faster, more reliable ingestion. - If the source is private (S3 private bucket, private URL, private GitHub/Drive in future), attach credentials using
SourceAccess. - The backend may validate that the locator is reachable and return a per-input error if it cannot be loaded.