Bulk Ingest & Index
Goal: load a large set of vectors efficiently, pick the right index and metric for your workload, and organize data with partitions. This recipe is the one you reach for when “insert a handful of rows” turns into “ingest a few million.”
Everything is plain Milvus 2.6 against your VBase database. For the exhaustive index and insert reference, see the Milvus documentation .
Before you start
You need an allocated database in RUNNING state and an IAM token — see the Quickstart and Connecting with the Milvus SDK.
pip install "pymilvus>=2.6,<2.7"Connect
from pymilvus import MilvusClient, DataType
client = MilvusClient(
uri="https://<endpoint>:443", # endpoint + port from GetServiceAccess / `dodil vbase db use`
token="<IAM access token>", # your IAM service-account token IS your Milvus token
db_name="<db_name>", # the allocated database
)Create the collection (defer the index)
For a large load it is often fastest to insert first and build the index afterward, so the index is built once over the full dataset rather than incrementally. Create the collection with a schema but no index yet.
schema = client.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=768)
schema.add_field("tenant", DataType.VARCHAR, max_length=64)
client.create_collection(collection_name="catalog", schema=schema)Insert in batches
Stream your rows in fixed-size batches rather than one giant call — this keeps request payloads bounded and gives you natural checkpoints. A few thousand rows per batch is a good starting point; tune to your vector dimension and row size.
def embed(text: str) -> list[float]:
"""Replace with your embedding model — must return a 768-dim vector."""
...
def source_rows():
"""Replace with your data source (a file, a queue, a DB cursor, ...)."""
for i in range(1_000_000):
yield {"id": i, "embedding": embed(f"item {i}"), "tenant": "acme"}
BATCH = 2_000
batch = []
for row in source_rows():
batch.append(row)
if len(batch) >= BATCH:
client.insert(collection_name="catalog", data=batch)
batch.clear()
if batch: # flush the final partial batch
client.insert(collection_name="catalog", data=batch)Keep batches uniform and retry a failed batch rather than the whole load. Each
insertis durable on success, so you can checkpoint on the last id you sent.
Choose an index
Build the index once after the bulk load. Pick the type by your scale and recall needs.
| Index | Search | Strength | Trade-off |
|---|---|---|---|
FLAT | Exact | Maximum recall; the correctness baseline | Latency grows fastest as data grows |
HNSW | Approximate (graph) | Strong latency/recall default | Higher memory use |
IVF_FLAT | Approximate (inverted file) | Scales well with tunable quality | Needs nprobe tuning for best recall |
IVF_SQ8 | IVF + scalar quantization | Much lower memory at large scale | Some accuracy loss from quantization |
Rules of thumb:
- Start with HNSW for low-latency, high-recall search when the dataset fits comfortably in memory.
- Move to IVF_FLAT when you want a tunable quality/cost knob (
nlistat build,nprobeat query) at larger scale. - Choose IVF_SQ8 when memory is the binding constraint and a small recall hit is acceptable.
- Keep FLAT for small collections or as a recall baseline to measure the others against.
Choose a metric
| Metric | Better score | Use when |
|---|---|---|
L2 | Smaller | You care about absolute (Euclidean) distance. |
IP | Larger | Both magnitude and direction matter (inner product). |
COSINE | Larger | Angular similarity matters; common for normalized text embeddings. |
The metric must match how you intend to query — set it at index time and use the same metric in search. For most normalized text embeddings, COSINE is the right default.
index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="IVF_FLAT",
metric_type="COSINE",
params={"nlist": 1024}, # more lists = finer partitioning; tune with nprobe at query time
)
client.create_index(collection_name="catalog", index_params=index_params)Load the collection
Load once the index is built. Search requests fail until the collection is loaded.
client.load_collection(collection_name="catalog")At query time, IVF_* indexes take an nprobe that trades recall for latency:
results = client.search(
collection_name="catalog",
data=[embed("a query")],
anns_field="embedding",
limit=10,
search_params={"metric_type": "COSINE", "params": {"nprobe": 16}},
output_fields=["tenant"],
)Organize with partitions
Partitions split a collection into named segments. When your data has a natural grouping — a tenant, a date range, a region — partitioning lets you restrict a search to the relevant slice, which cuts both latency and read cost.
client.create_partition(collection_name="catalog", partition_name="acme")
client.create_partition(collection_name="catalog", partition_name="globex")
# Insert directly into a partition.
client.insert(
collection_name="catalog",
data=[{"id": 9001, "embedding": embed("acme item"), "tenant": "acme"}],
partition_name="acme",
)
# Search only that partition — fewer vectors scanned, lower read cost.
results = client.search(
collection_name="catalog",
data=[embed("a query")],
anns_field="embedding",
limit=10,
search_params={"metric_type": "COSINE", "params": {"nprobe": 16}},
partition_names=["acme"],
)Partitions are an organizational tool, not a security boundary. Tenant isolation comes from your IAM scopes and separate VBase databases, not from partition names.
Notes
- Metering. Every batched
insertis metered as VectorWrite and the stored vectors count toward VectorStorage; eachsearchis metered as VectorRead. Searching a single partition reads fewer vectors than scanning the whole collection. All usage is scoped to your organization’s quota. - Build the index after the bulk load so it is built once over the full dataset. For incremental, always-on ingestion, build the index up front instead and accept incremental index maintenance.
- What VBase manages. The Milvus service, scaling, users, and RBAC. You operate at the collection/partition/index/search level with your IAM token.
- Reference. Full index parameters, segment/flush behavior, and bulk-import options are in the Milvus documentation .
See also
- Recipes — the full set of end-to-end workflows.
- Semantic Search — the single-collection search baseline.
- Hybrid Search (Dense + BM25) — add keyword matching.
- Connecting with the Milvus SDK — obtain your
endpoint,port, anddb_name. - Databases API — allocate a database and resolve its access.
- Quickstart — allocate, connect, and search end to end.
- Milvus documentation — full SDK and API reference.