External Collection — BYO embeddings

Goal: stand up a vector collection where you control the embedding pipeline. K3 stores + indexes the vectors; you pick the model, the dimensions, the metric, and push vectors with InsertVectors / UpsertVectors.

When to use:

You have a model K3 doesn’t host (third-party SaaS — OpenAI, Cohere, Voyage; or a custom fine-tuned model)
You need to bulk-load pre-computed embeddings from another system
You want learned-sparse models (SPLADE, BGE-M3 sparse) — supply both dense + sparse per record
You want compact embeddings (binary / int8 / float16) for storage / latency optimization

Shape:


   Your code  →  embed model (OpenAI, etc.)  →  dense (+ optional sparse) vectors
                                                          │
                                                          ▼
                                                  InsertVectors / UpsertVectors
                                                          │
                                                          ▼
                                                Milvus collection (EXTERNAL)
                                                          │
                                                          ▼
                                                      Search RPC

Prerequisites

dodil CLI + dodil login done

A bucket — kb-prod — with the vector engine configured (auto mode):


dodil k3 bucket create kb-prod
dodil k3 vector store create -b kb-prod -m auto

A way to compute embeddings on your side. Examples below use OpenAI ada-002 (1536 dims) and a custom SPLADE-style sparse model.

1. Create the collection — manual mode

The CLI exposes a minimal slice of AddVectorCollection (FLOAT + cosine + dense-only). For the full range — binary / int8 / float16 / bfloat16, BM25, EXTERNAL sparse — use the API.

CLI path — OpenAI-shaped (1536 dims, FLOAT, dense-only)


dodil k3 vector collection add-manual ada -b kb-prod \
  --description "Pre-computed OpenAI ada-002 embeddings" \
  --dimensions 1536 \
  --metric cosine \
  --embed-model openai/text-embedding-ada-002
 
export COLLECTION_ID=$(dodil k3 vector collection get ada -b kb-prod -o json | jq -r '.collectionId')

The --embed-model flag is important: when set, K3 records it on the collection so future text queries can route through the matching embedding service (if K3 has it). If you leave it empty, only the pre-embedded vector query path works.

API path — hybrid BM25 (Milvus computes sparse from text)


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "name": "hybrid-text",
    "description": "Dense + BM25 (Milvus computes sparse from each record text)",
    "dimensions": 1024,
    "distanceMetric": "DISTANCE_METRIC_COSINE",
    "embeddingType": "EMBEDDING_TYPE_FLOAT",
    "sparseMode": "SPARSE_MODE_BM25",
    "embedModel": "your-text-model-v1"
  }'

API path — EXTERNAL sparse (caller supplies both dense + sparse)

For SPLADE / BGE-M3 sparse where you compute the sparse vector yourself:


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "name": "splade-docs",
    "description": "Dense + caller-supplied learned-sparse vectors",
    "dimensions": 768,
    "distanceMetric": "DISTANCE_METRIC_COSINE",
    "embeddingType": "EMBEDDING_TYPE_FLOAT",
    "sparseMode": "SPARSE_MODE_EXTERNAL"
  }'

API path — binary embeddings (compact)

For perceptual hashes / quantized embeddings at dimensions / 8 bytes per vector:


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "name": "binary-hashes",
    "description": "256-bit perceptual hashes (32 bytes per vector)",
    "dimensions": 256,
    "distanceMetric": "DISTANCE_METRIC_HAMMING",
    "embeddingType": "EMBEDDING_TYPE_BINARY"
  }'

2. Insert vectors

Sketch — your code embeds, you call InsertVectors. Pseudocode in Python:


import openai
import requests
 
K3 = "https://k3.dev.dodil.io"
HEADERS = {
    "Authorization": f"Bearer {os.environ['DODIL_TOKEN']}",
    "Content-Type": "application/json",
}
 
docs = [
    {"id": "doc-1", "text": "Multi-head attention lets the model jointly attend to information from different representation subspaces."},
    {"id": "doc-2", "text": "BERT uses bidirectional self-attention over masked tokens."},
    {"id": "doc-3", "text": "GPT uses causal self-attention with left-to-right context."},
]
 
# 1. Compute embeddings on your side
client = openai.OpenAI()
resp = client.embeddings.create(
    model="text-embedding-ada-002",
    input=[d["text"] for d in docs],
)
 
# 2. Build VectorRecord payloads
vectors = []
for doc, emb in zip(docs, resp.data):
    vectors.append({
        "id": doc["id"],
        "denseFloat": {"values": emb.embedding},   # 1536 floats
        "metadata": {
            "source": "papers/attention.pdf",      # JSON Struct — typed values, not just strings
            "section": doc["id"],
            "year": 2017
        }
    })
 
# 3. Insert into K3
r = requests.post(
    f"{K3}/kb-prod/vector/collections/{COLLECTION_ID}/vectors",
    headers=HEADERS,
    json={"bucket": "kb-prod", "collectionId": COLLECTION_ID, "vectors": vectors},
)
print(r.json())  # {"inserted": "3"}

For an idempotent bulk-load (re-runs replace rows by id), use UpsertVectors — same shape, PUT instead of POST:


curl -sS -X PUT "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d @vectors.json

Response: {"inserted": 3, "deleted": 2} — deleted is the count of pre-existing rows replaced.

Hybrid (BM25) — include `text` per record

For a SPARSE_MODE_BM25 collection, include the raw text so Milvus’s BM25 function can derive the sparse vector:


{
  "id": "chunk-1",
  "denseFloat": { "values": [/* dense floats */] },
  "text": "Multi-head attention lets the model jointly attend to information from different representation subspaces.",
  "metadata": { "source": "papers/attention.pdf", "section": "3.2" }
}

EXTERNAL sparse — supply both dense + sparse

For a SPARSE_MODE_EXTERNAL collection (SPLADE / BGE-M3):


{
  "id": "splade-1",
  "denseFloat": { "values": [/* dense floats */] },
  "sparse": {
    "indices": [42, 137, 1024, 8888],
    "values":  [0.92, 0.41, 0.18, 0.07]
  },
  "metadata": { "doc_id": "abc-123" }
}

The sparse vector represents non-zero positions in a high-dimensional vocab space — typical sparsity is a few hundred non-zero indices out of 30K+ vocab.

3. Search

If you set --embed-model at create time AND K3 hosts the matching embedding service, text queries just work:


dodil k3 search "what is multi-head attention" -b kb-prod -c ada -o json

If you didn’t set embed_model (or K3 doesn’t host the model), you must pre-embed query-side and use the vector query path:


# 1. Embed query on your side
client = openai.OpenAI()
qresp = client.embeddings.create(model="text-embedding-ada-002", input="multi-head attention")
qvec = qresp.data[0].embedding   # 1536 floats
 
# 2. Search
import requests
r = requests.post(
    f"{K3}/kb-prod/vector/search",
    headers=HEADERS,
    json={
        "bucket": "kb-prod",
        "collectionName": "ada",
        "vector": {"values": qvec},
        "topK": 5,
        "searchMode": "SEARCH_MODE_VECTOR"
    },
)
print(r.json())

Batch retrieval — N queries in one round-trip


queries = ["multi-head attention", "BERT", "GPT"]
qresp = client.embeddings.create(model="text-embedding-ada-002", input=queries)
 
r = requests.post(
    f"{K3}/kb-prod/vector/search",
    headers=HEADERS,
    json={
        "bucket": "kb-prod",
        "collectionName": "ada",
        "vector": {
            "vectors": [{"values": e.embedding} for e in qresp.data]
        },
        "topK": 3
    },
)
 
# Regroup by query_index
results_by_query = {}
for r in r.json()["results"]:
    results_by_query.setdefault(r.get("queryIndex", 0), []).append(r)

One Milvus round-trip serves all three queries — materially faster than three serial calls.

4. Upsert + delete

Re-running InsertVectors with the same id creates duplicates in Milvus (Milvus inserts are not unique-key-checked). Use UpsertVectors for idempotent runs:


# Re-push the same record — pre-existing row is replaced
curl -sS -X PUT "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "collectionId": "'"$COLLECTION_ID"'",
    "vectors": [
      {
        "id": "doc-1",
        "denseFloat": {"values": [/* updated embedding */]},
        "metadata": {"source": "papers/attention.pdf", "section": "3.2", "version": 2}
      }
    ]
  }'

Delete by id:


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/vector/collections/$COLLECTION_ID/vectors:delete" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "collectionId": "'"$COLLECTION_ID"'",
    "ids": ["doc-1", "doc-2"]
  }'

Common gotchas

Symptom	Cause	Fix
`INVALID_ARGUMENT: dimension mismatch`	`denseFloat.values.length` ≠ collection’s `dimensions`	Confirm with `dodil k3 vector collection get` — recreate the collection if the embedding model’s output dim is different
`INVALID_ARGUMENT: dense type mismatch`	Sent `denseFloat` for a non-FLOAT collection (e.g. BINARY)	Use `denseBinary` / `denseFloat16` / `denseBfloat16` / `denseInt8` per the collection’s `embedding_type`
`FAILED_PRECONDITION` on text search	Collection has no `embed_model` set (you left it empty at create time)	Pre-embed the query and use the `vector` query path
`INVALID_ARGUMENT: sparse_vectors not supported`	Sent sparse on a `SPARSE_MODE_NONE` or `BM25` collection	Drop the `sparse` field from records — only `SPARSE_MODE_EXTERNAL` accepts caller sparse
`text` field silently ignored	Collection’s `sparse_mode` ≠ BM25	Not an error — `text` is only consumed in BM25 mode
Re-running Insert creates duplicates	Milvus inserts don’t check uniqueness on `id`	Use `UpsertVectors` (PUT) instead — same shape, idempotent
`Insert` rejects on a pipeline-mode collection	EXTERNAL writes only work on `embedding_source = EXTERNAL` collections	Create with `AddVectorCollection`, not `AddVectorPipeline`

When to pick external over pipeline-mode

Choose external	Choose pipeline
You have your own embedding pipeline	You want K3 to handle embedding
Third-party SaaS model (OpenAI / Cohere)	Want one of K3’s built-in templates
Custom fine-tuned model	Standard text / code / visual workflows
Pre-computed batches	Real-time on-upload ingest
Learned-sparse models (SPLADE / BGE-M3)	BM25 is enough for sparse
Need binary / int8 / float16 dense	FLOAT is fine
Want exact control over the embedding for each row	Template-driven chunking + embedding is acceptable