Search — API Reference
Package: dodil.k3.vector.v1 · Service: VectorService
The single RPC for vector retrieval. One method, three query shapes, three search modes, optional rerank, metadata pre-filter, multi-collection fan-out. The richest surface in the Vector service — worth reading end-to-end before building on it.
| RPC | HTTP |
|---|---|
Search | POST /:bucket/vector/search |
gRPC setup —
grpcurl, endpoints, reflection, and field-name casing — is covered once in Conventions → Using gRPC.
Request / response
HTTP
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"text": "what is multi-head attention",
"collectionName": "docs",
"topK": 10,
"searchMode": "SEARCH_MODE_AUTO",
"rerank": true,
"includeContent": true
}'Three query shapes (oneof query)
Exactly one of text / vector / s3_key must be set per request.
Shape 1 — text query (server-side embed)
K3 routes the string through the collection’s embed_model (via Ignite’s embedding service), then searches.
{
"bucket": "kb-prod",
"text": "what is multi-head attention",
"collectionName": "docs",
"topK": 10
}Requires the collection has embed_model set. For PIPELINE-mode collections this comes from the template’s ScriptContract. For EXTERNAL-mode collections, set it explicitly at AddVectorCollection time — otherwise text queries return FAILED_PRECONDITION (you’d have to pre-embed and use the vector shape instead).
Shape 2 — vector query (pre-embedded fast lane)
You supply the embedding(s); K3 goes straight to Milvus. Bypasses Scriptum entirely.
HTTP
Single dense vector:
{
"bucket": "kb-prod",
"collectionName": "docs",
"vector": {
"values": [0.12, -0.04, 0.91, /* ... 1021 more floats ... */]
},
"topK": 10
}Batch — three queries in one Milvus round-trip:
{
"bucket": "kb-prod",
"collectionName": "docs",
"vector": {
"vectors": [
{ "values": [/* dense vector 1 */] },
{ "values": [/* dense vector 2 */] },
{ "values": [/* dense vector 3 */] }
],
"outputFields": ["file_id", "text", "metadata"]
},
"topK": 5
}Each result carries queryIndex: 0|1|2 so the caller can regroup by input query.
Caller-supplied sparse (collection’s sparseMode must be EXTERNAL):
{
"bucket": "kb-prod",
"collectionName": "splade-vectors",
"vector": {
"values": [/* dense 768 floats */],
"sparseVectors": [
{
"indices": [42, 137, 1024],
"values": [0.9, 0.4, 0.1]
}
]
},
"searchMode": "SEARCH_MODE_HYBRID",
"topK": 20
}With Milvus-native tuning (HNSW ef higher than the K3 default):
{
"bucket": "kb-prod",
"collectionName": "docs",
"vector": {
"values": [/* ... */],
"searchParams": {
"ef": "256",
"metric_type": "COSINE"
},
"partitionNames": ["2026"],
"outputFields": ["file_id", "metadata"]
},
"topK": 50
}When to use the vector shape:
- You have your own embedding pipeline (third-party model, custom-tuned model) — pre-compute embeddings and search directly
- You’re doing batch retrieval — N queries in one Milvus round-trip is materially faster than N single calls
- You need Milvus-native tuning (e.g. raising
effor HNSW collections under recall-sensitive workloads) - You want to restrict to specific Milvus partitions for sharded retrieval
Shape 3 — file query (multimodal)
For collections built from a visual_embedding_index / face_embedding_index / object_embedding_index template, you can search by an object in your bucket — K3 fetches it, embeds it server-side, then searches.
{
"bucket": "kb-prod",
"collectionName": "product-images",
"s3Key": "queries/example-bag.jpg",
"contentType": "image/jpeg",
"topK": 20
}The contentType hint helps K3 pick the right embedding pipeline when the object’s stored content-type is ambiguous or wrong. Optional — K3 will fall back to the object’s stored content-type or extension.
Multimodal queries support all the same knobs as text queries — pre-filter, rerank, multi-collection, search modes.
Three search modes (SearchMode)
enum SearchMode {
SEARCH_MODE_UNSPECIFIED = 0; // → VECTOR
SEARCH_MODE_VECTOR = 1; // dense only
SEARCH_MODE_HYBRID = 2; // dense + BM25, RRF k=60
SEARCH_MODE_AUTO = 3; // HYBRID where collection has sparse, else VECTOR
}How modes interact with the collection’s sparse_mode:
Request search_mode | Collection sparse_mode = NONE | Collection sparse_mode = BM25 | Collection sparse_mode = EXTERNAL |
|---|---|---|---|
VECTOR (default) | dense | dense | dense |
HYBRID | warning + dense | dense + BM25 (RRF k=60) | dense + caller sparse (RRF k=60) |
AUTO | dense | dense + BM25 | dense + caller sparse |
AUTO is the most caller-friendly — pick the best mode per collection without knowing each one’s sparse config. Use VECTOR when you specifically want to skip BM25 (e.g. pure-semantic recall comparisons). Use HYBRID when you want hybrid forced even on dense-only collections (you’ll get a warning + dense results).
The response’s search_mode_used reports what actually ran: "vector" or "hybrid".
Multi-collection search
Leave collection_name empty to search all matching collections in the bucket.
{
"bucket": "kb-prod",
"text": "transformer architecture",
"topK": 20
}Compatibility group key
When collection_name is empty, K3 groups collections by (dimensions, embedding_type, embed_model). The query’s dense vector / text-embedded vector must be compatible with the group — collections sharing dimensions but different embed_model never co-mingle (you’d be comparing apples to oranges in embedding space).
Per-collection compatibility:
| Match | Comparable? |
|---|---|
Same embed_model + same dimensions + same embedding_type | ✅ co-search together |
Same dimensions + same embedding_type, different embed_model | ❌ separate groups — never fused |
Different dimensions | ❌ never |
Different embedding_type (e.g. FLOAT vs INT8) | ❌ never |
For a text query, K3 picks the group matching the embedding model implied by the query (uses the query’s first compatible collection’s embed_model). If you have collections from multiple model families (e.g. jina-embeddings-v4 AND openai/text-embedding-3-large) and want results from both, run two queries.
Per-collection observability
The response carries collection_statuses[] so you can debug partial failures:
{
"results": [/* merged across collections, sorted by score */],
"tookMs": "247",
"searchModeUsed": "hybrid",
"warnings": [],
"collectionStatuses": [
{
"collection": "docs",
"embeddingCompleted": true,
"searchCompleted": true,
"failReason": ""
},
{
"collection": "code-repo",
"embeddingCompleted": false,
"searchCompleted": false,
"failReason": "incompatible embed_model: docs uses jina-embeddings-v4, code-repo uses sentence-transformers/all-MiniLM"
}
]
}fail_reason per collection is the place to look when a multi-collection search returns fewer results than expected.
Narrowing without losing fan-out
To search a subset of collections without naming each one, lean on metadata in pre_filter:
{
"bucket": "kb-prod",
"text": "..."
/* no collectionName — fan out to all matching */,
"preFilter": {
"op": "LOGICAL_OP_AND",
"filters": [
{ "field": "tags", "op": "FILTER_OP_IN", "value": "docs,internal" }
]
}
}K3 pushes the filter to each collection in the group; matches that don’t have the metadata field are excluded.
For the full walkthrough — multi-collection setup, group keys, observability patterns — see Recipes → Multi-collection Search.
Pre-filter (FilterGroup)
Filter the metadata fields on records before vector retrieval. Reduces the candidate set Milvus has to score.
message MetadataFilter {
string field = 1;
FilterOp op = 2;
string value = 3;
}
message FilterGroup {
LogicalOp op = 1; // AND (default) | OR
repeated MetadataFilter filters = 2;
repeated FilterGroup groups = 3; // nested for arbitrary boolean expressions
}Operators
FilterOp | Meaning | Value shape |
|---|---|---|
FILTER_OP_EQ | field == value | any scalar |
FILTER_OP_NEQ | field != value | any scalar |
FILTER_OP_GT / _GTE / _LT / _LTE | comparison | numeric / string |
FILTER_OP_IN | field ∈ {values} | comma-separated string: "a,b,c" |
FILTER_OP_CONTAINS | substring on string fields | string |
FILTER_OP_EXISTS | field is present (any value) | ignored |
Examples
Simple AND:
{
"preFilter": {
"op": "LOGICAL_OP_AND",
"filters": [
{ "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" },
{ "field": "page", "op": "FILTER_OP_GTE", "value": "2" }
]
}
}Nested — (source contains papers/ AND page ≥ 2) OR tags has any of {transformer, attention}:
{
"preFilter": {
"op": "LOGICAL_OP_OR",
"groups": [
{
"op": "LOGICAL_OP_AND",
"filters": [
{ "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" },
{ "field": "page", "op": "FILTER_OP_GTE", "value": "2" }
]
},
{
"op": "LOGICAL_OP_OR",
"filters": [
{ "field": "tags", "op": "FILTER_OP_IN", "value": "transformer,attention" }
]
}
]
}
}FILTER_OP_IN uses comma-separated values, not arrays. Quote in JSON: "value": "a,b,c".
What metadata fields exist
Depends on where the vectors came from:
- EXTERNAL collections — whatever you set in
VectorRecord.metadata(free-form JSON Struct) - PIPELINE-mode collections — whatever the Scriptum index template emits. For
text_embedding_index: typicallysource_key,chunk_index,chunk_total,mime_type,extracted_at, plus any caller-supplied template inputs
describe-style introspection of a collection’s metadata schema isn’t exposed today — inspect a search result’s metadata map to discover the fields.
Rerank (Jina via Ignite)
Set rerank: true to run a cross-encoder reranker over the top-K candidates before returning. Improves precision on the top-3–5 results — important for RAG / agent grounding where the “best” result matters more than recall.
{
"bucket": "kb-prod",
"text": "what is the difference between BERT and GPT?",
"collectionName": "papers",
"topK": 20,
"rerank": true
}Cost: ~50–200 ms added latency per query (depends on top_k and corpus chunk length). Skip rerank for:
- Bulk retrieval where you’ll filter / aggregate downstream anyway
- Latency-sensitive paths (interactive search-as-you-type)
- Collections where vector recall is already strong (e.g. domain-tuned embedders)
rerank_text for binary queries
For image / audio / video queries (no inherent text query), supply rerank_text so the reranker has text to score against:
{
"bucket": "kb-prod",
"s3Key": "queries/example-bag.jpg",
"collectionName": "product-images",
"topK": 50,
"rerank": true,
"rerankText": "leather handbag with gold hardware"
}K3 reranks the image-search results against the supplied text — useful when you want “images that look like X AND match this description Y.”
For the full hybrid + rerank walkthrough, see Recipes → Hybrid + Rerank.
Result shape
message SearchResult {
dodil.k3.common.v1.ObjectRef object = 1; // bucket + key
float score = 2; // normalized similarity
map<string, string> metadata = 3; // Milvus output fields, stringified
SearchResultSource source = 4; // VECTOR | FULLTEXT | HYBRID
optional string chunk_id = 5;
optional int32 chunk_index = 6;
optional string content = 7; // include_content=true
optional string highlight = 8; // include_highlights=true
optional int32 query_index = 9; // batch vector queries
}Key fields:
object— bucket + key the result points back to (the source S3 object)score— higher = more relevant; normalized by Milvus per metricsource— which signal produced the result.VECTORfor dense-only,FULLTEXTfor BM25-only (unusual on its own),HYBRIDfor fused. Useful when debugging hybrid scoring.chunk_id/chunk_index— when the collection holds chunked content (pipeline-mode), points to the specific chunkcontent— the chunk text itself, opt-in viainclude_content: true. Default off to keep responses small.highlight— query-aware snippet (when supported by the collection), opt-in viainclude_highlights: truemetadata— Milvus output fields as a string map (numbers/booleans stringified). For typed metadata access, useVectorInput.output_fieldsto control which fields come back.query_index— only populated when the query wasVectorInput.vectors[](batch); regroup results by this field client-side
Worked examples
Simple RAG retrieval
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"text": "explain the attention mechanism",
"collectionName": "docs",
"topK": 5,
"searchMode": "SEARCH_MODE_AUTO",
"includeContent": true,
"rerank": true
}'Top-5 reranked chunks with text, hybrid mode if the collection has BM25.
High-recall analytic retrieval (no rerank)
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"text": "attention",
"collectionName": "docs",
"topK": 500,
"searchMode": "SEARCH_MODE_AUTO",
"preFilter": {
"op": "LOGICAL_OP_AND",
"filters": [
{ "field": "year", "op": "FILTER_OP_GTE", "value": "2017" }
]
}
}'500 candidates, hybrid mode, post-2017 only. No rerank — you’ll aggregate downstream.
Multi-collection search
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"text": "transformer architecture",
"topK": 20,
"searchMode": "SEARCH_MODE_AUTO",
"rerank": true,
"includeContent": true
}'No collectionName → searches all matching collections. Inspect collection_statuses[] in the response to confirm which collections participated.
Batch retrieval (5 queries in one round-trip)
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"collectionName": "docs",
"vector": {
"vectors": [
{ "values": [/* query 1 dense vector */] },
{ "values": [/* query 2 */] },
{ "values": [/* query 3 */] },
{ "values": [/* query 4 */] },
{ "values": [/* query 5 */] }
],
"outputFields": ["file_id", "metadata"]
},
"topK": 5
}'Returns ~25 results total (5 queries × 5 results each); regroup by query_index client-side.
Batch + hybrid is one-at-a-time. Batching is for the dense fast path. Combining a multi-vector batch with caller-supplied
sparse_vectors(i.e. multiple dense+sparse pairs in one call) returnsUNIMPLEMENTED— send those as separate requests. Single-query hybrid is fully supported.
Tuned HNSW search
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"collectionName": "docs",
"vector": {
"values": [/* ... */],
"searchParams": {
"ef": "256"
},
"partitionNames": ["2026"]
},
"topK": 100
}'HNSW ef raised from K3’s default (top_k * 2) for higher recall; restricted to the 2026 Milvus partition.
Multimodal image-by-key
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"s3Key": "queries/example-bag.jpg",
"contentType": "image/jpeg",
"collectionName": "product-images",
"topK": 20,
"rerank": true,
"rerankText": "leather handbag with gold hardware",
"includeContent": false
}'K3 fetches the image, embeds it server-side via the collection’s embed_model, searches, then reranks against the supplied text.
For the full walkthrough, see Recipes → Multimodal Search.
When the request fails
Common errors:
| Symptom | Cause | Fix |
|---|---|---|
FAILED_PRECONDITION on text query | Collection has no embed_model set (EXTERNAL-mode without embedModel) | Set embed_model at AddVectorCollection time, or pre-embed and use vector shape |
INVALID_ARGUMENT “vector dimension mismatch” | VectorInput.values length ≠ collection’s dimensions | Check dodil k3 vector collection get <id> |
INVALID_ARGUMENT “sparse_vectors not supported on this collection” | Supplied sparse_vectors but collection’s sparse_mode ≠ EXTERNAL | Either drop sparse from the request or change the collection’s sparse mode (requires recreate) |
Empty results, collection_statuses shows fail_reason: "incompatible embed_model" | Multi-collection search across model families | Run separate queries per model family OR set collection_name to one specific collection |
| Hybrid mode returns dense-only results + warning | Collection’s sparse_mode = NONE | Add SPARSE_MODE_BM25 or _EXTERNAL to the collection (requires recreate), or use SEARCH_MODE_VECTOR to suppress the warning |
VBase escape hatch
K3 doesn’t surface every Milvus feature. For these, use VBase directly against the same VBase endpoint your engine is configured for:
- Custom index types (DiskANN, IVF_PQ variants, …)
- Partition lifecycle (
create_partition,drop_partition,release_partition) - Alter collection schema
- Raw
collection.search()with bypass of K3’s pre-filter normalization
If you need those features regularly, configure the bucket’s engine in external mode pointing at your own VBase cluster — K3 doesn’t lock concurrent VBase writes.
See also
- Collections — the collections you search
- Vectors — direct writes (EXTERNAL-mode collections)
- Core Concepts → SearchRequest — type signature + filter group
- Recipes → Multi-collection Search — group key + per-collection statuses + observability patterns
- Recipes → Hybrid + Rerank — when each tier helps + latency cost
- Recipes → Multimodal Search —
s3_keyquery end-to-end - VBase — raw Milvus features K3 doesn’t expose
grpcurlreference — full flag set + reflection-disabled fallbacks