Skip to Content
We are live but in Staging 🎉
VectorRecipesExternal Collection

External Collection — BYO embeddings

Goal: stand up a vector collection where you control the embedding pipeline. K3 stores + indexes the vectors; you pick the model, the dimensions, the metric, and push vectors with InsertVectors / UpsertVectors.

When to use:

  • You have a model K3 doesn’t host (third-party SaaS — OpenAI, Cohere, Voyage; or a custom fine-tuned model)
  • You need to bulk-load pre-computed embeddings from another system
  • You want learned-sparse models (SPLADE, BGE-M3 sparse) — supply both dense + sparse per record
  • You want compact embeddings (binary / int8 / float16) for storage / latency optimization

Shape:

Your code → embed model (OpenAI, etc.) → dense (+ optional sparse) vectors InsertVectors / UpsertVectors Milvus collection (EXTERNAL) Search RPC

Prerequisites

  • dodil CLI + dodil login done
  • A bucket — kb-prod — with the vector engine configured (auto mode):
    dodil k3 bucket create kb-prod dodil k3 vector store create -b kb-prod -m auto
  • A way to compute embeddings on your side. Examples below use OpenAI ada-002 (1536 dims) and a custom SPLADE-style sparse model.

1. Create the collection — manual mode

The CLI exposes a minimal slice of AddVectorCollection (FLOAT + cosine + dense-only). For the full range — binary / int8 / float16 / bfloat16, BM25, EXTERNAL sparse — use the API.

CLI path — OpenAI-shaped (1536 dims, FLOAT, dense-only)

dodil k3 vector collection add-manual ada -b kb-prod \ --description "Pre-computed OpenAI ada-002 embeddings" \ --dimensions 1536 \ --metric cosine \ --embed-model openai/text-embedding-ada-002 export COLLECTION_ID=$(dodil k3 vector collection get ada -b kb-prod -o json | jq -r '.collectionId')

The --embed-model flag is important: when set, K3 records it on the collection so future text queries can route through the matching embedding service (if K3 has it). If you leave it empty, only the pre-embedded vector query path works.

API path — hybrid BM25 (Milvus computes sparse from text)

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "name": "hybrid-text", "description": "Dense + BM25 (Milvus computes sparse from each record text)", "dimensions": 1024, "distanceMetric": "DISTANCE_METRIC_COSINE", "embeddingType": "EMBEDDING_TYPE_FLOAT", "sparseMode": "SPARSE_MODE_BM25", "embedModel": "your-text-model-v1" }'

API path — EXTERNAL sparse (caller supplies both dense + sparse)

For SPLADE / BGE-M3 sparse where you compute the sparse vector yourself:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "name": "splade-docs", "description": "Dense + caller-supplied learned-sparse vectors", "dimensions": 768, "distanceMetric": "DISTANCE_METRIC_COSINE", "embeddingType": "EMBEDDING_TYPE_FLOAT", "sparseMode": "SPARSE_MODE_EXTERNAL" }'

API path — binary embeddings (compact)

For perceptual hashes / quantized embeddings at dimensions / 8 bytes per vector:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "name": "binary-hashes", "description": "256-bit perceptual hashes (32 bytes per vector)", "dimensions": 256, "distanceMetric": "DISTANCE_METRIC_HAMMING", "embeddingType": "EMBEDDING_TYPE_BINARY" }'

2. Insert vectors

Sketch — your code embeds, you call InsertVectors. Pseudocode in Python:

import openai import requests K3 = "https://k3.dev.dodil.io" HEADERS = { "Authorization": f"Bearer {os.environ['DODIL_TOKEN']}", "Content-Type": "application/json", } docs = [ {"id": "doc-1", "text": "Multi-head attention lets the model jointly attend to information from different representation subspaces."}, {"id": "doc-2", "text": "BERT uses bidirectional self-attention over masked tokens."}, {"id": "doc-3", "text": "GPT uses causal self-attention with left-to-right context."}, ] # 1. Compute embeddings on your side client = openai.OpenAI() resp = client.embeddings.create( model="text-embedding-ada-002", input=[d["text"] for d in docs], ) # 2. Build VectorRecord payloads vectors = [] for doc, emb in zip(docs, resp.data): vectors.append({ "id": doc["id"], "denseFloat": {"values": emb.embedding}, # 1536 floats "metadata": { "source": "papers/attention.pdf", # JSON Struct — typed values, not just strings "section": doc["id"], "year": 2017 } }) # 3. Insert into K3 r = requests.post( f"{K3}/kb-prod/vector/collections/{COLLECTION_ID}/vectors", headers=HEADERS, json={"bucket": "kb-prod", "collectionId": COLLECTION_ID, "vectors": vectors}, ) print(r.json()) # {"inserted": "3"}

For an idempotent bulk-load (re-runs replace rows by id), use UpsertVectors — same shape, PUT instead of POST:

curl -sS -X PUT "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d @vectors.json

Response: {"inserted": 3, "deleted": 2}deleted is the count of pre-existing rows replaced.

Hybrid (BM25) — include text per record

For a SPARSE_MODE_BM25 collection, include the raw text so Milvus’s BM25 function can derive the sparse vector:

{ "id": "chunk-1", "denseFloat": { "values": [/* dense floats */] }, "text": "Multi-head attention lets the model jointly attend to information from different representation subspaces.", "metadata": { "source": "papers/attention.pdf", "section": "3.2" } }

EXTERNAL sparse — supply both dense + sparse

For a SPARSE_MODE_EXTERNAL collection (SPLADE / BGE-M3):

{ "id": "splade-1", "denseFloat": { "values": [/* dense floats */] }, "sparse": { "indices": [42, 137, 1024, 8888], "values": [0.92, 0.41, 0.18, 0.07] }, "metadata": { "doc_id": "abc-123" } }

The sparse vector represents non-zero positions in a high-dimensional vocab space — typical sparsity is a few hundred non-zero indices out of 30K+ vocab.

If you set --embed-model at create time AND K3 hosts the matching embedding service, text queries just work:

dodil k3 search "what is multi-head attention" -b kb-prod -c ada -o json

If you didn’t set embed_model (or K3 doesn’t host the model), you must pre-embed query-side and use the vector query path:

# 1. Embed query on your side client = openai.OpenAI() qresp = client.embeddings.create(model="text-embedding-ada-002", input="multi-head attention") qvec = qresp.data[0].embedding # 1536 floats # 2. Search import requests r = requests.post( f"{K3}/kb-prod/vector/search", headers=HEADERS, json={ "bucket": "kb-prod", "collectionName": "ada", "vector": {"values": qvec}, "topK": 5, "searchMode": "SEARCH_MODE_VECTOR" }, ) print(r.json())

Batch retrieval — N queries in one round-trip

queries = ["multi-head attention", "BERT", "GPT"] qresp = client.embeddings.create(model="text-embedding-ada-002", input=queries) r = requests.post( f"{K3}/kb-prod/vector/search", headers=HEADERS, json={ "bucket": "kb-prod", "collectionName": "ada", "vector": { "vectors": [{"values": e.embedding} for e in qresp.data] }, "topK": 3 }, ) # Regroup by query_index results_by_query = {} for r in r.json()["results"]: results_by_query.setdefault(r.get("queryIndex", 0), []).append(r)

One Milvus round-trip serves all three queries — materially faster than three serial calls.

4. Upsert + delete

Re-running InsertVectors with the same id creates duplicates in Milvus (Milvus inserts are not unique-key-checked). Use UpsertVectors for idempotent runs:

# Re-push the same record — pre-existing row is replaced curl -sS -X PUT "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "collectionId": "'"$COLLECTION_ID"'", "vectors": [ { "id": "doc-1", "denseFloat": {"values": [/* updated embedding */]}, "metadata": {"source": "papers/attention.pdf", "section": "3.2", "version": 2} } ] }'

Delete by id:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors:delete" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "collectionId": "'"$COLLECTION_ID"'", "ids": ["doc-1", "doc-2"] }'

Common gotchas

SymptomCauseFix
INVALID_ARGUMENT: dimension mismatchdenseFloat.values.length ≠ collection’s dimensionsConfirm with dodil k3 vector collection get — recreate the collection if the embedding model’s output dim is different
INVALID_ARGUMENT: dense type mismatchSent denseFloat for a non-FLOAT collection (e.g. BINARY)Use denseBinary / denseFloat16 / denseBfloat16 / denseInt8 per the collection’s embedding_type
FAILED_PRECONDITION on text searchCollection has no embed_model set (you left it empty at create time)Pre-embed the query and use the vector query path
INVALID_ARGUMENT: sparse_vectors not supportedSent sparse on a SPARSE_MODE_NONE or BM25 collectionDrop the sparse field from records — only SPARSE_MODE_EXTERNAL accepts caller sparse
text field silently ignoredCollection’s sparse_mode ≠ BM25Not an error — text is only consumed in BM25 mode
Re-running Insert creates duplicatesMilvus inserts don’t check uniqueness on idUse UpsertVectors (PUT) instead — same shape, idempotent
Insert rejects on a pipeline-mode collectionEXTERNAL writes only work on embedding_source = EXTERNAL collectionsCreate with AddVectorCollection, not AddVectorPipeline

When to pick external over pipeline-mode

Choose externalChoose pipeline
You have your own embedding pipelineYou want K3 to handle embedding
Third-party SaaS model (OpenAI / Cohere)Want one of K3’s built-in templates
Custom fine-tuned modelStandard text / code / visual workflows
Pre-computed batchesReal-time on-upload ingest
Learned-sparse models (SPLADE / BGE-M3)BM25 is enough for sparse
Need binary / int8 / float16 denseFLOAT is fine
Want exact control over the embedding for each rowTemplate-driven chunking + embedding is acceptable

See also