Bulk Ingest & Index

Goal: load a large set of vectors efficiently, pick the right index and metric for your workload, and organize data with partitions. This recipe is the one you reach for when “insert a handful of rows” turns into “ingest a few million.”

Everything is plain Milvus 2.6 against your VBase database. For the exhaustive index and insert reference, see the Milvus documentation .

Before you start

You need an allocated database in RUNNING state and an IAM token — see the Quickstart and Connecting with the Milvus SDK.


pip install "pymilvus>=2.6,<2.7"

Connect


from pymilvus import MilvusClient, DataType
 
client = MilvusClient(
    uri="https://<endpoint>:443",   # endpoint + port from GetServiceAccess / `dodil vbase db use`
    token="<IAM access token>",      # your IAM service-account token IS your Milvus token
    db_name="<db_name>",             # the allocated database
)

Create the collection (defer the index)

For a large load it is often fastest to insert first and build the index afterward, so the index is built once over the full dataset rather than incrementally. Create the collection with a schema but no index yet.


schema = client.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=768)
schema.add_field("tenant", DataType.VARCHAR, max_length=64)
 
client.create_collection(collection_name="catalog", schema=schema)

Insert in batches

Stream your rows in fixed-size batches rather than one giant call — this keeps request payloads bounded and gives you natural checkpoints. A few thousand rows per batch is a good starting point; tune to your vector dimension and row size.


def embed(text: str) -> list[float]:
    """Replace with your embedding model — must return a 768-dim vector."""
    ...
 
def source_rows():
    """Replace with your data source (a file, a queue, a DB cursor, ...)."""
    for i in range(1_000_000):
        yield {"id": i, "embedding": embed(f"item {i}"), "tenant": "acme"}
 
BATCH = 2_000
batch = []
for row in source_rows():
    batch.append(row)
    if len(batch) >= BATCH:
        client.insert(collection_name="catalog", data=batch)
        batch.clear()
 
if batch:                       # flush the final partial batch
    client.insert(collection_name="catalog", data=batch)

Keep batches uniform and retry a failed batch rather than the whole load. Each insert is durable on success, so you can checkpoint on the last id you sent.

Choose an index

Build the index once after the bulk load. Pick the type by your scale and recall needs.

Index	Search	Strength	Trade-off
`FLAT`	Exact	Maximum recall; the correctness baseline	Latency grows fastest as data grows
`HNSW`	Approximate (graph)	Strong latency/recall default	Higher memory use
`IVF_FLAT`	Approximate (inverted file)	Scales well with tunable quality	Needs `nprobe` tuning for best recall
`IVF_SQ8`	IVF + scalar quantization	Much lower memory at large scale	Some accuracy loss from quantization

Rules of thumb:

Start with HNSW for low-latency, high-recall search when the dataset fits comfortably in memory.
Move to IVF_FLAT when you want a tunable quality/cost knob (nlist at build, nprobe at query) at larger scale.
Choose IVF_SQ8 when memory is the binding constraint and a small recall hit is acceptable.
Keep FLAT for small collections or as a recall baseline to measure the others against.

Choose a metric

Metric	Better score	Use when
`L2`	Smaller	You care about absolute (Euclidean) distance.
`IP`	Larger	Both magnitude and direction matter (inner product).
`COSINE`	Larger	Angular similarity matters; common for normalized text embeddings.

The metric must match how you intend to query — set it at index time and use the same metric in search. For most normalized text embeddings, COSINE is the right default.


index_params = client.prepare_index_params()
index_params.add_index(
    field_name="embedding",
    index_type="IVF_FLAT",
    metric_type="COSINE",
    params={"nlist": 1024},      # more lists = finer partitioning; tune with nprobe at query time
)
 
client.create_index(collection_name="catalog", index_params=index_params)

Load the collection

Load once the index is built. Search requests fail until the collection is loaded.


client.load_collection(collection_name="catalog")

At query time, IVF_* indexes take an nprobe that trades recall for latency:


results = client.search(
    collection_name="catalog",
    data=[embed("a query")],
    anns_field="embedding",
    limit=10,
    search_params={"metric_type": "COSINE", "params": {"nprobe": 16}},
    output_fields=["tenant"],
)

Organize with partitions

Partitions split a collection into named segments. When your data has a natural grouping — a tenant, a date range, a region — partitioning lets you restrict a search to the relevant slice, which cuts both latency and read cost.


client.create_partition(collection_name="catalog", partition_name="acme")
client.create_partition(collection_name="catalog", partition_name="globex")
 
# Insert directly into a partition.
client.insert(
    collection_name="catalog",
    data=[{"id": 9001, "embedding": embed("acme item"), "tenant": "acme"}],
    partition_name="acme",
)
 
# Search only that partition — fewer vectors scanned, lower read cost.
results = client.search(
    collection_name="catalog",
    data=[embed("a query")],
    anns_field="embedding",
    limit=10,
    search_params={"metric_type": "COSINE", "params": {"nprobe": 16}},
    partition_names=["acme"],
)

Partitions are an organizational tool, not a security boundary. Tenant isolation comes from your IAM scopes and separate VBase databases, not from partition names.

Notes

Metering. Every batched insert is metered as VectorWrite and the stored vectors count toward VectorStorage; each search is metered as VectorRead. Searching a single partition reads fewer vectors than scanning the whole collection. All usage is scoped to your organization’s quota.
Build the index after the bulk load so it is built once over the full dataset. For incremental, always-on ingestion, build the index up front instead and accept incremental index maintenance.
What VBase manages. The Milvus service, scaling, users, and RBAC. You operate at the collection/partition/index/search level with your IAM token.
Reference. Full index parameters, segment/flush behavior, and bulk-import options are in the Milvus documentation .