Skip to Content
We are live but in Staging 🎉

Search — API Reference

Package: dodil.k3.vector.v1 · Service: VectorService

The single RPC for vector retrieval. One method, three query shapes, three search modes, optional rerank, metadata pre-filter, multi-collection fan-out. The richest surface in the Vector service — worth reading end-to-end before building on it.

RPCHTTP
SearchPOST /:bucket/vector/search

gRPC setup — grpcurl, endpoints, reflection, and field-name casing — is covered once in Conventions → Using gRPC.

Request / response

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "text": "what is multi-head attention", "collectionName": "docs", "topK": 10, "searchMode": "SEARCH_MODE_AUTO", "rerank": true, "includeContent": true }'

Three query shapes (oneof query)

Exactly one of text / vector / s3_key must be set per request.

Shape 1 — text query (server-side embed)

K3 routes the string through the collection’s embed_model (via Ignite’s embedding service), then searches.

{ "bucket": "kb-prod", "text": "what is multi-head attention", "collectionName": "docs", "topK": 10 }

Requires the collection has embed_model set. For PIPELINE-mode collections this comes from the template’s ScriptContract. For EXTERNAL-mode collections, set it explicitly at AddVectorCollection time — otherwise text queries return FAILED_PRECONDITION (you’d have to pre-embed and use the vector shape instead).

Shape 2 — vector query (pre-embedded fast lane)

You supply the embedding(s); K3 goes straight to Milvus. Bypasses Scriptum entirely.

Single dense vector:

{ "bucket": "kb-prod", "collectionName": "docs", "vector": { "values": [0.12, -0.04, 0.91, /* ... 1021 more floats ... */] }, "topK": 10 }

Batch — three queries in one Milvus round-trip:

{ "bucket": "kb-prod", "collectionName": "docs", "vector": { "vectors": [ { "values": [/* dense vector 1 */] }, { "values": [/* dense vector 2 */] }, { "values": [/* dense vector 3 */] } ], "outputFields": ["file_id", "text", "metadata"] }, "topK": 5 }

Each result carries queryIndex: 0|1|2 so the caller can regroup by input query.

Caller-supplied sparse (collection’s sparseMode must be EXTERNAL):

{ "bucket": "kb-prod", "collectionName": "splade-vectors", "vector": { "values": [/* dense 768 floats */], "sparseVectors": [ { "indices": [42, 137, 1024], "values": [0.9, 0.4, 0.1] } ] }, "searchMode": "SEARCH_MODE_HYBRID", "topK": 20 }

With Milvus-native tuning (HNSW ef higher than the K3 default):

{ "bucket": "kb-prod", "collectionName": "docs", "vector": { "values": [/* ... */], "searchParams": { "ef": "256", "metric_type": "COSINE" }, "partitionNames": ["2026"], "outputFields": ["file_id", "metadata"] }, "topK": 50 }

When to use the vector shape:

  • You have your own embedding pipeline (third-party model, custom-tuned model) — pre-compute embeddings and search directly
  • You’re doing batch retrieval — N queries in one Milvus round-trip is materially faster than N single calls
  • You need Milvus-native tuning (e.g. raising ef for HNSW collections under recall-sensitive workloads)
  • You want to restrict to specific Milvus partitions for sharded retrieval

Shape 3 — file query (multimodal)

For collections built from a visual_embedding_index / face_embedding_index / object_embedding_index template, you can search by an object in your bucket — K3 fetches it, embeds it server-side, then searches.

{ "bucket": "kb-prod", "collectionName": "product-images", "s3Key": "queries/example-bag.jpg", "contentType": "image/jpeg", "topK": 20 }

The contentType hint helps K3 pick the right embedding pipeline when the object’s stored content-type is ambiguous or wrong. Optional — K3 will fall back to the object’s stored content-type or extension.

Multimodal queries support all the same knobs as text queries — pre-filter, rerank, multi-collection, search modes.


Three search modes (SearchMode)

enum SearchMode { SEARCH_MODE_UNSPECIFIED = 0; // → VECTOR SEARCH_MODE_VECTOR = 1; // dense only SEARCH_MODE_HYBRID = 2; // dense + BM25, RRF k=60 SEARCH_MODE_AUTO = 3; // HYBRID where collection has sparse, else VECTOR }

How modes interact with the collection’s sparse_mode:

Request search_modeCollection sparse_mode = NONECollection sparse_mode = BM25Collection sparse_mode = EXTERNAL
VECTOR (default)densedensedense
HYBRIDwarning + densedense + BM25 (RRF k=60)dense + caller sparse (RRF k=60)
AUTOdensedense + BM25dense + caller sparse

AUTO is the most caller-friendly — pick the best mode per collection without knowing each one’s sparse config. Use VECTOR when you specifically want to skip BM25 (e.g. pure-semantic recall comparisons). Use HYBRID when you want hybrid forced even on dense-only collections (you’ll get a warning + dense results).

The response’s search_mode_used reports what actually ran: "vector" or "hybrid".


Leave collection_name empty to search all matching collections in the bucket.

{ "bucket": "kb-prod", "text": "transformer architecture", "topK": 20 }

Compatibility group key

When collection_name is empty, K3 groups collections by (dimensions, embedding_type, embed_model). The query’s dense vector / text-embedded vector must be compatible with the group — collections sharing dimensions but different embed_model never co-mingle (you’d be comparing apples to oranges in embedding space).

Per-collection compatibility:

MatchComparable?
Same embed_model + same dimensions + same embedding_type✅ co-search together
Same dimensions + same embedding_type, different embed_model❌ separate groups — never fused
Different dimensions❌ never
Different embedding_type (e.g. FLOAT vs INT8)❌ never

For a text query, K3 picks the group matching the embedding model implied by the query (uses the query’s first compatible collection’s embed_model). If you have collections from multiple model families (e.g. jina-embeddings-v4 AND openai/text-embedding-3-large) and want results from both, run two queries.

Per-collection observability

The response carries collection_statuses[] so you can debug partial failures:

{ "results": [/* merged across collections, sorted by score */], "tookMs": "247", "searchModeUsed": "hybrid", "warnings": [], "collectionStatuses": [ { "collection": "docs", "embeddingCompleted": true, "searchCompleted": true, "failReason": "" }, { "collection": "code-repo", "embeddingCompleted": false, "searchCompleted": false, "failReason": "incompatible embed_model: docs uses jina-embeddings-v4, code-repo uses sentence-transformers/all-MiniLM" } ] }

fail_reason per collection is the place to look when a multi-collection search returns fewer results than expected.

Narrowing without losing fan-out

To search a subset of collections without naming each one, lean on metadata in pre_filter:

{ "bucket": "kb-prod", "text": "..." /* no collectionName — fan out to all matching */, "preFilter": { "op": "LOGICAL_OP_AND", "filters": [ { "field": "tags", "op": "FILTER_OP_IN", "value": "docs,internal" } ] } }

K3 pushes the filter to each collection in the group; matches that don’t have the metadata field are excluded.

For the full walkthrough — multi-collection setup, group keys, observability patterns — see Recipes → Multi-collection Search.


Pre-filter (FilterGroup)

Filter the metadata fields on records before vector retrieval. Reduces the candidate set Milvus has to score.

message MetadataFilter { string field = 1; FilterOp op = 2; string value = 3; } message FilterGroup { LogicalOp op = 1; // AND (default) | OR repeated MetadataFilter filters = 2; repeated FilterGroup groups = 3; // nested for arbitrary boolean expressions }

Operators

FilterOpMeaningValue shape
FILTER_OP_EQfield == valueany scalar
FILTER_OP_NEQfield != valueany scalar
FILTER_OP_GT / _GTE / _LT / _LTEcomparisonnumeric / string
FILTER_OP_INfield ∈ {values}comma-separated string: "a,b,c"
FILTER_OP_CONTAINSsubstring on string fieldsstring
FILTER_OP_EXISTSfield is present (any value)ignored

Examples

Simple AND:

{ "preFilter": { "op": "LOGICAL_OP_AND", "filters": [ { "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" }, { "field": "page", "op": "FILTER_OP_GTE", "value": "2" } ] } }

Nested — (source contains papers/ AND page ≥ 2) OR tags has any of {transformer, attention}:

{ "preFilter": { "op": "LOGICAL_OP_OR", "groups": [ { "op": "LOGICAL_OP_AND", "filters": [ { "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" }, { "field": "page", "op": "FILTER_OP_GTE", "value": "2" } ] }, { "op": "LOGICAL_OP_OR", "filters": [ { "field": "tags", "op": "FILTER_OP_IN", "value": "transformer,attention" } ] } ] } }

FILTER_OP_IN uses comma-separated values, not arrays. Quote in JSON: "value": "a,b,c".

What metadata fields exist

Depends on where the vectors came from:

  • EXTERNAL collections — whatever you set in VectorRecord.metadata (free-form JSON Struct)
  • PIPELINE-mode collections — whatever the Scriptum index template emits. For text_embedding_index: typically source_key, chunk_index, chunk_total, mime_type, extracted_at, plus any caller-supplied template inputs

describe-style introspection of a collection’s metadata schema isn’t exposed today — inspect a search result’s metadata map to discover the fields.


Rerank (Jina via Ignite)

Set rerank: true to run a cross-encoder reranker over the top-K candidates before returning. Improves precision on the top-3–5 results — important for RAG / agent grounding where the “best” result matters more than recall.

{ "bucket": "kb-prod", "text": "what is the difference between BERT and GPT?", "collectionName": "papers", "topK": 20, "rerank": true }

Cost: ~50–200 ms added latency per query (depends on top_k and corpus chunk length). Skip rerank for:

  • Bulk retrieval where you’ll filter / aggregate downstream anyway
  • Latency-sensitive paths (interactive search-as-you-type)
  • Collections where vector recall is already strong (e.g. domain-tuned embedders)

rerank_text for binary queries

For image / audio / video queries (no inherent text query), supply rerank_text so the reranker has text to score against:

{ "bucket": "kb-prod", "s3Key": "queries/example-bag.jpg", "collectionName": "product-images", "topK": 50, "rerank": true, "rerankText": "leather handbag with gold hardware" }

K3 reranks the image-search results against the supplied text — useful when you want “images that look like X AND match this description Y.”

For the full hybrid + rerank walkthrough, see Recipes → Hybrid + Rerank.


Result shape

message SearchResult { dodil.k3.common.v1.ObjectRef object = 1; // bucket + key float score = 2; // normalized similarity map<string, string> metadata = 3; // Milvus output fields, stringified SearchResultSource source = 4; // VECTOR | FULLTEXT | HYBRID optional string chunk_id = 5; optional int32 chunk_index = 6; optional string content = 7; // include_content=true optional string highlight = 8; // include_highlights=true optional int32 query_index = 9; // batch vector queries }

Key fields:

  • object — bucket + key the result points back to (the source S3 object)
  • score — higher = more relevant; normalized by Milvus per metric
  • source — which signal produced the result. VECTOR for dense-only, FULLTEXT for BM25-only (unusual on its own), HYBRID for fused. Useful when debugging hybrid scoring.
  • chunk_id / chunk_index — when the collection holds chunked content (pipeline-mode), points to the specific chunk
  • content — the chunk text itself, opt-in via include_content: true. Default off to keep responses small.
  • highlight — query-aware snippet (when supported by the collection), opt-in via include_highlights: true
  • metadata — Milvus output fields as a string map (numbers/booleans stringified). For typed metadata access, use VectorInput.output_fields to control which fields come back.
  • query_index — only populated when the query was VectorInput.vectors[] (batch); regroup results by this field client-side

Worked examples

Simple RAG retrieval

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "text": "explain the attention mechanism", "collectionName": "docs", "topK": 5, "searchMode": "SEARCH_MODE_AUTO", "includeContent": true, "rerank": true }'

Top-5 reranked chunks with text, hybrid mode if the collection has BM25.

High-recall analytic retrieval (no rerank)

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "text": "attention", "collectionName": "docs", "topK": 500, "searchMode": "SEARCH_MODE_AUTO", "preFilter": { "op": "LOGICAL_OP_AND", "filters": [ { "field": "year", "op": "FILTER_OP_GTE", "value": "2017" } ] } }'

500 candidates, hybrid mode, post-2017 only. No rerank — you’ll aggregate downstream.

Multi-collection search

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "text": "transformer architecture", "topK": 20, "searchMode": "SEARCH_MODE_AUTO", "rerank": true, "includeContent": true }'

No collectionName → searches all matching collections. Inspect collection_statuses[] in the response to confirm which collections participated.

Batch retrieval (5 queries in one round-trip)

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "collectionName": "docs", "vector": { "vectors": [ { "values": [/* query 1 dense vector */] }, { "values": [/* query 2 */] }, { "values": [/* query 3 */] }, { "values": [/* query 4 */] }, { "values": [/* query 5 */] } ], "outputFields": ["file_id", "metadata"] }, "topK": 5 }'

Returns ~25 results total (5 queries × 5 results each); regroup by query_index client-side.

Batch + hybrid is one-at-a-time. Batching is for the dense fast path. Combining a multi-vector batch with caller-supplied sparse_vectors (i.e. multiple dense+sparse pairs in one call) returns UNIMPLEMENTED — send those as separate requests. Single-query hybrid is fully supported.

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "collectionName": "docs", "vector": { "values": [/* ... */], "searchParams": { "ef": "256" }, "partitionNames": ["2026"] }, "topK": 100 }'

HNSW ef raised from K3’s default (top_k * 2) for higher recall; restricted to the 2026 Milvus partition.

Multimodal image-by-key

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "s3Key": "queries/example-bag.jpg", "contentType": "image/jpeg", "collectionName": "product-images", "topK": 20, "rerank": true, "rerankText": "leather handbag with gold hardware", "includeContent": false }'

K3 fetches the image, embeds it server-side via the collection’s embed_model, searches, then reranks against the supplied text.

For the full walkthrough, see Recipes → Multimodal Search.


When the request fails

Common errors:

SymptomCauseFix
FAILED_PRECONDITION on text queryCollection has no embed_model set (EXTERNAL-mode without embedModel)Set embed_model at AddVectorCollection time, or pre-embed and use vector shape
INVALID_ARGUMENT “vector dimension mismatch”VectorInput.values length ≠ collection’s dimensionsCheck dodil k3 vector collection get <id>
INVALID_ARGUMENT “sparse_vectors not supported on this collection”Supplied sparse_vectors but collection’s sparse_mode ≠ EXTERNALEither drop sparse from the request or change the collection’s sparse mode (requires recreate)
Empty results, collection_statuses shows fail_reason: "incompatible embed_model"Multi-collection search across model familiesRun separate queries per model family OR set collection_name to one specific collection
Hybrid mode returns dense-only results + warningCollection’s sparse_mode = NONEAdd SPARSE_MODE_BM25 or _EXTERNAL to the collection (requires recreate), or use SEARCH_MODE_VECTOR to suppress the warning

VBase escape hatch

K3 doesn’t surface every Milvus feature. For these, use VBase  directly against the same VBase endpoint your engine is configured for:

  • Custom index types (DiskANN, IVF_PQ variants, …)
  • Partition lifecycle (create_partition, drop_partition, release_partition)
  • Alter collection schema
  • Raw collection.search() with bypass of K3’s pre-filter normalization

If you need those features regularly, configure the bucket’s engine in external mode pointing at your own VBase cluster — K3 doesn’t lock concurrent VBase writes.


See also