Search — API Reference

Package: dodil.k3.vector.v1 · Service: VectorService

The single RPC for vector retrieval. One method, three query shapes, three search modes, optional rerank, metadata pre-filter, multi-collection fan-out. The richest surface in the Vector service — worth reading end-to-end before building on it.

RPC	HTTP
`Search`	`POST /:bucket/vector/search`

gRPC setup — grpcurl, endpoints, reflection, and field-name casing — is covered once in Conventions → Using gRPC.

Request / response

HTTP


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "text": "what is multi-head attention",
    "collectionName": "docs",
    "topK": 10,
    "searchMode": "SEARCH_MODE_AUTO",
    "rerank": true,
    "includeContent": true
  }'

gRPC


rpc Search(SearchRequest) returns (SearchResponse);
 
message SearchRequest {
  string bucket = 1;
  // ── Query (exactly one variant) ──
  oneof query {
    string text = 2;                          // server-side embed via collection's embed_model
    VectorInput vector = 3;                   // pre-embedded fast lane (bypasses Scriptum)
    string s3_key = 4;                        // multimodal file query — image / audio / video
  }
  optional string content_type = 5;           // hint for file queries (e.g. "image/png")
  // ── Filters ──
  optional FilterGroup pre_filter = 6;
  repeated string source_ids = 7;
  // ── Result shaping ──
  int32 top_k = 8;                            // default 10
  float min_score = 9;
  // ── Targeting ──
  optional string collection_name = 10;       // empty = ALL matching collections in the bucket
  SearchMode search_mode = 11;
  // ── Rerank ──
  bool rerank = 12;                           // run Jina reranker on top-K
  optional string rerank_text = 13;           // override for binary queries
  // ── Output ──
  bool include_content = 14;                  // include chunk text in results
  bool include_highlights = 15;
}
 
message SearchResponse {
  repeated SearchResult results = 1;
  dodil.k3.common.v1.PaginationResponse pagination = 2;
  int64 took_ms = 3;
  string search_mode_used = 4;                // "vector" | "hybrid"
  repeated string warnings = 5;
  repeated CollectionSearchStatus collection_statuses = 6;
}
 
message SearchResult {
  dodil.k3.common.v1.ObjectRef object = 1;
  float score = 2;
  map<string, string> metadata = 3;
  SearchResultSource source = 4;              // VECTOR | FULLTEXT | HYBRID
  optional string chunk_id = 5;
  optional int32 chunk_index = 6;
  optional string content = 7;                // when include_content=true
  optional string highlight = 8;              // when include_highlights=true
  // Batch vector queries: which input-query this result belongs to.
  // Omitted / 0 for single-query search.
  optional int32 query_index = 9;
}
 
message CollectionSearchStatus {
  string collection = 1;
  bool embedding_completed = 2;
  bool search_completed = 3;
  string fail_reason = 4;                     // empty = success
}
 
enum SearchMode {
  SEARCH_MODE_UNSPECIFIED = 0;                // → VECTOR
  SEARCH_MODE_VECTOR = 1;                     // dense only
  SEARCH_MODE_HYBRID = 2;                     // dense + BM25, RRF k=60
  SEARCH_MODE_AUTO = 3;                       // HYBRID where collection has sparse, else VECTOR
}
 
enum SearchResultSource {
  SEARCH_SOURCE_UNSPECIFIED = 0;
  SEARCH_SOURCE_VECTOR = 1;
  SEARCH_SOURCE_FULLTEXT = 2;
  SEARCH_SOURCE_HYBRID = 3;
}

Basic text-query search via grpcurl:


grpcurl \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -d '{
    "bucket": "kb-prod",
    "text": "what is multi-head attention",
    "collection_name": "docs",
    "top_k": 10,
    "search_mode": "SEARCH_MODE_AUTO",
    "rerank": true,
    "include_content": true
  }' \
  $K3_GRPC \
  dodil.k3.vector.v1.VectorService/Search

Three query shapes (`oneof query`)

Exactly one of text / vector / s3_key must be set per request.

Shape 1 — text query (server-side embed)

K3 routes the string through the collection’s embed_model (via Ignite’s embedding service), then searches.


{
  "bucket": "kb-prod",
  "text": "what is multi-head attention",
  "collectionName": "docs",
  "topK": 10
}

Requires the collection has embed_model set. For PIPELINE-mode collections this comes from the template’s ScriptContract. For EXTERNAL-mode collections, set it explicitly at AddVectorCollection time — otherwise text queries return FAILED_PRECONDITION (you’d have to pre-embed and use the vector shape instead).

Shape 2 — vector query (pre-embedded fast lane)

You supply the embedding(s); K3 goes straight to Milvus. Bypasses Scriptum entirely.

HTTP

Single dense vector:


{
  "bucket": "kb-prod",
  "collectionName": "docs",
  "vector": {
    "values": [0.12, -0.04, 0.91, /* ... 1021 more floats ... */]
  },
  "topK": 10
}

Batch — three queries in one Milvus round-trip:


{
  "bucket": "kb-prod",
  "collectionName": "docs",
  "vector": {
    "vectors": [
      { "values": [/* dense vector 1 */] },
      { "values": [/* dense vector 2 */] },
      { "values": [/* dense vector 3 */] }
    ],
    "outputFields": ["file_id", "text", "metadata"]
  },
  "topK": 5
}

Each result carries queryIndex: 0|1|2 so the caller can regroup by input query.

Caller-supplied sparse (collection’s sparseMode must be EXTERNAL):


{
  "bucket": "kb-prod",
  "collectionName": "splade-vectors",
  "vector": {
    "values": [/* dense 768 floats */],
    "sparseVectors": [
      {
        "indices": [42, 137, 1024],
        "values":  [0.9, 0.4, 0.1]
      }
    ]
  },
  "searchMode": "SEARCH_MODE_HYBRID",
  "topK": 20
}

With Milvus-native tuning (HNSW ef higher than the K3 default):


{
  "bucket": "kb-prod",
  "collectionName": "docs",
  "vector": {
    "values": [/* ... */],
    "searchParams": {
      "ef": "256",
      "metric_type": "COSINE"
    },
    "partitionNames": ["2026"],
    "outputFields": ["file_id", "metadata"]
  },
  "topK": 50
}

When to use the vector shape:

You have your own embedding pipeline (third-party model, custom-tuned model) — pre-compute embeddings and search directly
You’re doing batch retrieval — N queries in one Milvus round-trip is materially faster than N single calls
You need Milvus-native tuning (e.g. raising ef for HNSW collections under recall-sensitive workloads)
You want to restrict to specific Milvus partitions for sharded retrieval

Shape 3 — file query (multimodal)

For collections built from a visual_embedding_index / face_embedding_index / object_embedding_index template, you can search by an object in your bucket — K3 fetches it, embeds it server-side, then searches.


{
  "bucket": "kb-prod",
  "collectionName": "product-images",
  "s3Key": "queries/example-bag.jpg",
  "contentType": "image/jpeg",
  "topK": 20
}

The contentType hint helps K3 pick the right embedding pipeline when the object’s stored content-type is ambiguous or wrong. Optional — K3 will fall back to the object’s stored content-type or extension.

Multimodal queries support all the same knobs as text queries — pre-filter, rerank, multi-collection, search modes.

Three search modes (`SearchMode`)


enum SearchMode {
  SEARCH_MODE_UNSPECIFIED = 0;                // → VECTOR
  SEARCH_MODE_VECTOR = 1;                     // dense only
  SEARCH_MODE_HYBRID = 2;                     // dense + BM25, RRF k=60
  SEARCH_MODE_AUTO = 3;                       // HYBRID where collection has sparse, else VECTOR
}

How modes interact with the collection’s sparse_mode:

Request `search_mode`	Collection `sparse_mode = NONE`	Collection `sparse_mode = BM25`	Collection `sparse_mode = EXTERNAL`
`VECTOR` (default)	dense	dense	dense
`HYBRID`	warning + dense	dense + BM25 (RRF k=60)	dense + caller sparse (RRF k=60)
`AUTO`	dense	dense + BM25	dense + caller sparse

AUTO is the most caller-friendly — pick the best mode per collection without knowing each one’s sparse config. Use VECTOR when you specifically want to skip BM25 (e.g. pure-semantic recall comparisons). Use HYBRID when you want hybrid forced even on dense-only collections (you’ll get a warning + dense results).

The response’s search_mode_used reports what actually ran: "vector" or "hybrid".

Multi-collection search

Leave collection_name empty to search all matching collections in the bucket.


{
  "bucket": "kb-prod",
  "text": "transformer architecture",
  "topK": 20
}

Compatibility group key

When collection_name is empty, K3 groups collections by (dimensions, embedding_type, embed_model). The query’s dense vector / text-embedded vector must be compatible with the group — collections sharing dimensions but different embed_model never co-mingle (you’d be comparing apples to oranges in embedding space).

Per-collection compatibility:

Match	Comparable?
Same `embed_model` + same `dimensions` + same `embedding_type`	✅ co-search together
Same `dimensions` + same `embedding_type`, different `embed_model`	❌ separate groups — never fused
Different `dimensions`	❌ never
Different `embedding_type` (e.g. FLOAT vs INT8)	❌ never

For a text query, K3 picks the group matching the embedding model implied by the query (uses the query’s first compatible collection’s embed_model). If you have collections from multiple model families (e.g. jina-embeddings-v4 AND openai/text-embedding-3-large) and want results from both, run two queries.

Per-collection observability

The response carries collection_statuses[] so you can debug partial failures:


{
  "results": [/* merged across collections, sorted by score */],
  "tookMs": "247",
  "searchModeUsed": "hybrid",
  "warnings": [],
  "collectionStatuses": [
    {
      "collection": "docs",
      "embeddingCompleted": true,
      "searchCompleted": true,
      "failReason": ""
    },
    {
      "collection": "code-repo",
      "embeddingCompleted": false,
      "searchCompleted": false,
      "failReason": "incompatible embed_model: docs uses jina-embeddings-v4, code-repo uses sentence-transformers/all-MiniLM"
    }
  ]
}

fail_reason per collection is the place to look when a multi-collection search returns fewer results than expected.

Narrowing without losing fan-out

To search a subset of collections without naming each one, lean on metadata in pre_filter:


{
  "bucket": "kb-prod",
  "text": "..."
  /* no collectionName — fan out to all matching */,
  "preFilter": {
    "op": "LOGICAL_OP_AND",
    "filters": [
      { "field": "tags", "op": "FILTER_OP_IN", "value": "docs,internal" }
    ]
  }
}

K3 pushes the filter to each collection in the group; matches that don’t have the metadata field are excluded.

For the full walkthrough — multi-collection setup, group keys, observability patterns — see Recipes → Multi-collection Search.

Pre-filter (`FilterGroup`)

Filter the metadata fields on records before vector retrieval. Reduces the candidate set Milvus has to score.


message MetadataFilter {
  string field = 1;
  FilterOp op = 2;
  string value = 3;
}
 
message FilterGroup {
  LogicalOp op = 1;                           // AND (default) | OR
  repeated MetadataFilter filters = 2;
  repeated FilterGroup groups = 3;            // nested for arbitrary boolean expressions
}

Operators

`FilterOp`	Meaning	Value shape
`FILTER_OP_EQ`	`field == value`	any scalar
`FILTER_OP_NEQ`	`field != value`	any scalar
`FILTER_OP_GT` / `_GTE` / `_LT` / `_LTE`	comparison	numeric / string
`FILTER_OP_IN`	`field ∈ {values}`	comma-separated string: `"a,b,c"`
`FILTER_OP_CONTAINS`	substring on string fields	string
`FILTER_OP_EXISTS`	field is present (any value)	ignored

Examples

Simple AND:


{
  "preFilter": {
    "op": "LOGICAL_OP_AND",
    "filters": [
      { "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" },
      { "field": "page",       "op": "FILTER_OP_GTE",      "value": "2" }
    ]
  }
}

Nested — (source contains papers/ AND page ≥ 2) OR tags has any of {transformer, attention}:


{
  "preFilter": {
    "op": "LOGICAL_OP_OR",
    "groups": [
      {
        "op": "LOGICAL_OP_AND",
        "filters": [
          { "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" },
          { "field": "page",       "op": "FILTER_OP_GTE",      "value": "2" }
        ]
      },
      {
        "op": "LOGICAL_OP_OR",
        "filters": [
          { "field": "tags", "op": "FILTER_OP_IN", "value": "transformer,attention" }
        ]
      }
    ]
  }
}

FILTER_OP_IN uses comma-separated values, not arrays. Quote in JSON: "value": "a,b,c".

What metadata fields exist

Depends on where the vectors came from:

EXTERNAL collections — whatever you set in VectorRecord.metadata (free-form JSON Struct)
PIPELINE-mode collections — whatever the Scriptum index template emits. For text_embedding_index: typically source_key, chunk_index, chunk_total, mime_type, extracted_at, plus any caller-supplied template inputs

describe-style introspection of a collection’s metadata schema isn’t exposed today — inspect a search result’s metadata map to discover the fields.

Rerank (Jina via Ignite)

Set rerank: true to run a cross-encoder reranker over the top-K candidates before returning. Improves precision on the top-3–5 results — important for RAG / agent grounding where the “best” result matters more than recall.


{
  "bucket": "kb-prod",
  "text": "what is the difference between BERT and GPT?",
  "collectionName": "papers",
  "topK": 20,
  "rerank": true
}

Cost: ~50–200 ms added latency per query (depends on top_k and corpus chunk length). Skip rerank for:

Bulk retrieval where you’ll filter / aggregate downstream anyway
Latency-sensitive paths (interactive search-as-you-type)
Collections where vector recall is already strong (e.g. domain-tuned embedders)

`rerank_text` for binary queries

For image / audio / video queries (no inherent text query), supply rerank_text so the reranker has text to score against:


{
  "bucket": "kb-prod",
  "s3Key": "queries/example-bag.jpg",
  "collectionName": "product-images",
  "topK": 50,
  "rerank": true,
  "rerankText": "leather handbag with gold hardware"
}

K3 reranks the image-search results against the supplied text — useful when you want “images that look like X AND match this description Y.”

For the full hybrid + rerank walkthrough, see Recipes → Hybrid + Rerank.

Result shape


message SearchResult {
  dodil.k3.common.v1.ObjectRef object = 1;   // bucket + key
  float score = 2;                           // normalized similarity
  map<string, string> metadata = 3;          // Milvus output fields, stringified
  SearchResultSource source = 4;             // VECTOR | FULLTEXT | HYBRID
  optional string chunk_id = 5;
  optional int32 chunk_index = 6;
  optional string content = 7;               // include_content=true
  optional string highlight = 8;             // include_highlights=true
  optional int32 query_index = 9;            // batch vector queries
}

Key fields:

object — bucket + key the result points back to (the source S3 object)
score — higher = more relevant; normalized by Milvus per metric
source — which signal produced the result. VECTOR for dense-only, FULLTEXT for BM25-only (unusual on its own), HYBRID for fused. Useful when debugging hybrid scoring.
chunk_id / chunk_index — when the collection holds chunked content (pipeline-mode), points to the specific chunk
content — the chunk text itself, opt-in via include_content: true. Default off to keep responses small.
highlight — query-aware snippet (when supported by the collection), opt-in via include_highlights: true
metadata — Milvus output fields as a string map (numbers/booleans stringified). For typed metadata access, use VectorInput.output_fields to control which fields come back.
query_index — only populated when the query was VectorInput.vectors[] (batch); regroup results by this field client-side

Worked examples

Simple RAG retrieval


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "text": "explain the attention mechanism",
    "collectionName": "docs",
    "topK": 5,
    "searchMode": "SEARCH_MODE_AUTO",
    "includeContent": true,
    "rerank": true
  }'

Top-5 reranked chunks with text, hybrid mode if the collection has BM25.

High-recall analytic retrieval (no rerank)


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "text": "attention",
    "collectionName": "docs",
    "topK": 500,
    "searchMode": "SEARCH_MODE_AUTO",
    "preFilter": {
      "op": "LOGICAL_OP_AND",
      "filters": [
        { "field": "year", "op": "FILTER_OP_GTE", "value": "2017" }
      ]
    }
  }'

500 candidates, hybrid mode, post-2017 only. No rerank — you’ll aggregate downstream.

Multi-collection search


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "text": "transformer architecture",
    "topK": 20,
    "searchMode": "SEARCH_MODE_AUTO",
    "rerank": true,
    "includeContent": true
  }'

No collectionName → searches all matching collections. Inspect collection_statuses[] in the response to confirm which collections participated.

Batch retrieval (5 queries in one round-trip)


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "collectionName": "docs",
    "vector": {
      "vectors": [
        { "values": [/* query 1 dense vector */] },
        { "values": [/* query 2 */] },
        { "values": [/* query 3 */] },
        { "values": [/* query 4 */] },
        { "values": [/* query 5 */] }
      ],
      "outputFields": ["file_id", "metadata"]
    },
    "topK": 5
  }'

Returns ~25 results total (5 queries × 5 results each); regroup by query_index client-side.

Batch + hybrid is one-at-a-time. Batching is for the dense fast path. Combining a multi-vector batch with caller-supplied sparse_vectors (i.e. multiple dense+sparse pairs in one call) returns UNIMPLEMENTED — send those as separate requests. Single-query hybrid is fully supported.

Tuned HNSW search


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "collectionName": "docs",
    "vector": {
      "values": [/* ... */],
      "searchParams": {
        "ef": "256"
      },
      "partitionNames": ["2026"]
    },
    "topK": 100
  }'

HNSW ef raised from K3’s default (top_k * 2) for higher recall; restricted to the 2026 Milvus partition.

Multimodal image-by-key


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "s3Key": "queries/example-bag.jpg",
    "contentType": "image/jpeg",
    "collectionName": "product-images",
    "topK": 20,
    "rerank": true,
    "rerankText": "leather handbag with gold hardware",
    "includeContent": false
  }'

K3 fetches the image, embeds it server-side via the collection’s embed_model, searches, then reranks against the supplied text.

For the full walkthrough, see Recipes → Multimodal Search.

When the request fails

Common errors:

Symptom	Cause	Fix
`FAILED_PRECONDITION` on text query	Collection has no `embed_model` set (EXTERNAL-mode without `embedModel`)	Set `embed_model` at `AddVectorCollection` time, or pre-embed and use `vector` shape
`INVALID_ARGUMENT` “vector dimension mismatch”	`VectorInput.values` length ≠ collection’s `dimensions`	Check `dodil k3 vector collection get <id>`
`INVALID_ARGUMENT` “sparse_vectors not supported on this collection”	Supplied `sparse_vectors` but collection’s `sparse_mode` ≠ EXTERNAL	Either drop sparse from the request or change the collection’s sparse mode (requires recreate)
Empty results, `collection_statuses` shows `fail_reason: "incompatible embed_model"`	Multi-collection search across model families	Run separate queries per model family OR set `collection_name` to one specific collection
Hybrid mode returns dense-only results + warning	Collection’s `sparse_mode = NONE`	Add `SPARSE_MODE_BM25` or `_EXTERNAL` to the collection (requires recreate), or use `SEARCH_MODE_VECTOR` to suppress the warning

VBase escape hatch

K3 doesn’t surface every Milvus feature. For these, use VBase directly against the same VBase endpoint your engine is configured for:

Custom index types (DiskANN, IVF_PQ variants, …)
Partition lifecycle (create_partition, drop_partition, release_partition)
Alter collection schema
Raw collection.search() with bypass of K3’s pre-filter normalization

If you need those features regularly, configure the bucket’s engine in external mode pointing at your own VBase cluster — K3 doesn’t lock concurrent VBase writes.

Search — API Reference

Request / response

HTTP

gRPC

Three query shapes (oneof query)

Shape 1 — text query (server-side embed)

Shape 2 — vector query (pre-embedded fast lane)

HTTP

gRPC

Shape 3 — file query (multimodal)

Three search modes (SearchMode)

Multi-collection search

Compatibility group key

Per-collection observability

Narrowing without losing fan-out

Pre-filter (FilterGroup)

Operators

Examples

What metadata fields exist

Rerank (Jina via Ignite)

rerank_text for binary queries

Result shape

Worked examples

Simple RAG retrieval

High-recall analytic retrieval (no rerank)

Multi-collection search

Batch retrieval (5 queries in one round-trip)

Tuned HNSW search

Multimodal image-by-key

When the request fails

VBase escape hatch

See also

Three query shapes (`oneof query`)

Three search modes (`SearchMode`)

Pre-filter (`FilterGroup`)

`rerank_text` for binary queries