External Collection — BYO embeddings
Goal: stand up a vector collection where you control the embedding pipeline. K3 stores + indexes the vectors; you pick the model, the dimensions, the metric, and push vectors with InsertVectors / UpsertVectors.
When to use:
- You have a model K3 doesn’t host (third-party SaaS — OpenAI, Cohere, Voyage; or a custom fine-tuned model)
- You need to bulk-load pre-computed embeddings from another system
- You want learned-sparse models (SPLADE, BGE-M3 sparse) — supply both dense + sparse per record
- You want compact embeddings (binary / int8 / float16) for storage / latency optimization
Shape:
Your code → embed model (OpenAI, etc.) → dense (+ optional sparse) vectors
│
▼
InsertVectors / UpsertVectors
│
▼
Milvus collection (EXTERNAL)
│
▼
Search RPCPrerequisites
dodilCLI +dodil logindone- A bucket —
kb-prod— with the vector engine configured (auto mode):dodil k3 bucket create kb-prod dodil k3 vector store create -b kb-prod -m auto - A way to compute embeddings on your side. Examples below use OpenAI ada-002 (1536 dims) and a custom SPLADE-style sparse model.
1. Create the collection — manual mode
The CLI exposes a minimal slice of AddVectorCollection (FLOAT + cosine + dense-only). For the full range — binary / int8 / float16 / bfloat16, BM25, EXTERNAL sparse — use the API.
CLI path — OpenAI-shaped (1536 dims, FLOAT, dense-only)
dodil k3 vector collection add-manual ada -b kb-prod \
--description "Pre-computed OpenAI ada-002 embeddings" \
--dimensions 1536 \
--metric cosine \
--embed-model openai/text-embedding-ada-002
export COLLECTION_ID=$(dodil k3 vector collection get ada -b kb-prod -o json | jq -r '.collectionId')The --embed-model flag is important: when set, K3 records it on the collection so future text queries can route through the matching embedding service (if K3 has it). If you leave it empty, only the pre-embedded vector query path works.
API path — hybrid BM25 (Milvus computes sparse from text)
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"name": "hybrid-text",
"description": "Dense + BM25 (Milvus computes sparse from each record text)",
"dimensions": 1024,
"distanceMetric": "DISTANCE_METRIC_COSINE",
"embeddingType": "EMBEDDING_TYPE_FLOAT",
"sparseMode": "SPARSE_MODE_BM25",
"embedModel": "your-text-model-v1"
}'API path — EXTERNAL sparse (caller supplies both dense + sparse)
For SPLADE / BGE-M3 sparse where you compute the sparse vector yourself:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"name": "splade-docs",
"description": "Dense + caller-supplied learned-sparse vectors",
"dimensions": 768,
"distanceMetric": "DISTANCE_METRIC_COSINE",
"embeddingType": "EMBEDDING_TYPE_FLOAT",
"sparseMode": "SPARSE_MODE_EXTERNAL"
}'API path — binary embeddings (compact)
For perceptual hashes / quantized embeddings at dimensions / 8 bytes per vector:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"name": "binary-hashes",
"description": "256-bit perceptual hashes (32 bytes per vector)",
"dimensions": 256,
"distanceMetric": "DISTANCE_METRIC_HAMMING",
"embeddingType": "EMBEDDING_TYPE_BINARY"
}'2. Insert vectors
Sketch — your code embeds, you call InsertVectors. Pseudocode in Python:
import openai
import requests
K3 = "https://k3.dev.dodil.io"
HEADERS = {
"Authorization": f"Bearer {os.environ['DODIL_TOKEN']}",
"Content-Type": "application/json",
}
docs = [
{"id": "doc-1", "text": "Multi-head attention lets the model jointly attend to information from different representation subspaces."},
{"id": "doc-2", "text": "BERT uses bidirectional self-attention over masked tokens."},
{"id": "doc-3", "text": "GPT uses causal self-attention with left-to-right context."},
]
# 1. Compute embeddings on your side
client = openai.OpenAI()
resp = client.embeddings.create(
model="text-embedding-ada-002",
input=[d["text"] for d in docs],
)
# 2. Build VectorRecord payloads
vectors = []
for doc, emb in zip(docs, resp.data):
vectors.append({
"id": doc["id"],
"denseFloat": {"values": emb.embedding}, # 1536 floats
"metadata": {
"source": "papers/attention.pdf", # JSON Struct — typed values, not just strings
"section": doc["id"],
"year": 2017
}
})
# 3. Insert into K3
r = requests.post(
f"{K3}/kb-prod/vector/collections/{COLLECTION_ID}/vectors",
headers=HEADERS,
json={"bucket": "kb-prod", "collectionId": COLLECTION_ID, "vectors": vectors},
)
print(r.json()) # {"inserted": "3"}For an idempotent bulk-load (re-runs replace rows by id), use UpsertVectors — same shape, PUT instead of POST:
curl -sS -X PUT "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d @vectors.jsonResponse: {"inserted": 3, "deleted": 2} — deleted is the count of pre-existing rows replaced.
Hybrid (BM25) — include text per record
For a SPARSE_MODE_BM25 collection, include the raw text so Milvus’s BM25 function can derive the sparse vector:
{
"id": "chunk-1",
"denseFloat": { "values": [/* dense floats */] },
"text": "Multi-head attention lets the model jointly attend to information from different representation subspaces.",
"metadata": { "source": "papers/attention.pdf", "section": "3.2" }
}EXTERNAL sparse — supply both dense + sparse
For a SPARSE_MODE_EXTERNAL collection (SPLADE / BGE-M3):
{
"id": "splade-1",
"denseFloat": { "values": [/* dense floats */] },
"sparse": {
"indices": [42, 137, 1024, 8888],
"values": [0.92, 0.41, 0.18, 0.07]
},
"metadata": { "doc_id": "abc-123" }
}The sparse vector represents non-zero positions in a high-dimensional vocab space — typical sparsity is a few hundred non-zero indices out of 30K+ vocab.
3. Search
If you set --embed-model at create time AND K3 hosts the matching embedding service, text queries just work:
dodil k3 search "what is multi-head attention" -b kb-prod -c ada -o jsonIf you didn’t set embed_model (or K3 doesn’t host the model), you must pre-embed query-side and use the vector query path:
# 1. Embed query on your side
client = openai.OpenAI()
qresp = client.embeddings.create(model="text-embedding-ada-002", input="multi-head attention")
qvec = qresp.data[0].embedding # 1536 floats
# 2. Search
import requests
r = requests.post(
f"{K3}/kb-prod/vector/search",
headers=HEADERS,
json={
"bucket": "kb-prod",
"collectionName": "ada",
"vector": {"values": qvec},
"topK": 5,
"searchMode": "SEARCH_MODE_VECTOR"
},
)
print(r.json())Batch retrieval — N queries in one round-trip
queries = ["multi-head attention", "BERT", "GPT"]
qresp = client.embeddings.create(model="text-embedding-ada-002", input=queries)
r = requests.post(
f"{K3}/kb-prod/vector/search",
headers=HEADERS,
json={
"bucket": "kb-prod",
"collectionName": "ada",
"vector": {
"vectors": [{"values": e.embedding} for e in qresp.data]
},
"topK": 3
},
)
# Regroup by query_index
results_by_query = {}
for r in r.json()["results"]:
results_by_query.setdefault(r.get("queryIndex", 0), []).append(r)One Milvus round-trip serves all three queries — materially faster than three serial calls.
4. Upsert + delete
Re-running InsertVectors with the same id creates duplicates in Milvus (Milvus inserts are not unique-key-checked). Use UpsertVectors for idempotent runs:
# Re-push the same record — pre-existing row is replaced
curl -sS -X PUT "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"collectionId": "'"$COLLECTION_ID"'",
"vectors": [
{
"id": "doc-1",
"denseFloat": {"values": [/* updated embedding */]},
"metadata": {"source": "papers/attention.pdf", "section": "3.2", "version": 2}
}
]
}'Delete by id:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors:delete" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"collectionId": "'"$COLLECTION_ID"'",
"ids": ["doc-1", "doc-2"]
}'Common gotchas
| Symptom | Cause | Fix |
|---|---|---|
INVALID_ARGUMENT: dimension mismatch | denseFloat.values.length ≠ collection’s dimensions | Confirm with dodil k3 vector collection get — recreate the collection if the embedding model’s output dim is different |
INVALID_ARGUMENT: dense type mismatch | Sent denseFloat for a non-FLOAT collection (e.g. BINARY) | Use denseBinary / denseFloat16 / denseBfloat16 / denseInt8 per the collection’s embedding_type |
FAILED_PRECONDITION on text search | Collection has no embed_model set (you left it empty at create time) | Pre-embed the query and use the vector query path |
INVALID_ARGUMENT: sparse_vectors not supported | Sent sparse on a SPARSE_MODE_NONE or BM25 collection | Drop the sparse field from records — only SPARSE_MODE_EXTERNAL accepts caller sparse |
text field silently ignored | Collection’s sparse_mode ≠ BM25 | Not an error — text is only consumed in BM25 mode |
| Re-running Insert creates duplicates | Milvus inserts don’t check uniqueness on id | Use UpsertVectors (PUT) instead — same shape, idempotent |
Insert rejects on a pipeline-mode collection | EXTERNAL writes only work on embedding_source = EXTERNAL collections | Create with AddVectorCollection, not AddVectorPipeline |
When to pick external over pipeline-mode
| Choose external | Choose pipeline |
|---|---|
| You have your own embedding pipeline | You want K3 to handle embedding |
| Third-party SaaS model (OpenAI / Cohere) | Want one of K3’s built-in templates |
| Custom fine-tuned model | Standard text / code / visual workflows |
| Pre-computed batches | Real-time on-upload ingest |
| Learned-sparse models (SPLADE / BGE-M3) | BM25 is enough for sparse |
| Need binary / int8 / float16 dense | FLOAT is fine |
| Want exact control over the embedding for each row | Template-driven chunking + embedding is acceptable |
See also
- Pipeline Collection — opposite shape: K3 handles embedding via a Scriptum template
- Hybrid + Rerank — get the most out of a hybrid (BM25 or EXTERNAL sparse) collection
- Multi-collection Search — search across multiple external collections sharing a model
- Vectors — API Reference — full record + RPC spec
- Collections — API Reference — all five dense types + three sparse modes