Skip to Content
We are live but in Staging 🎉
RecipesBulk Ingest & Index

Bulk Ingest & Index

Goal: load a large set of vectors efficiently, pick the right index and metric for your workload, and organize data with partitions. This recipe is the one you reach for when “insert a handful of rows” turns into “ingest a few million.”

Everything is plain Milvus 2.6 against your VBase database. For the exhaustive index and insert reference, see the Milvus documentation .

Before you start

You need an allocated database in RUNNING state and an IAM token — see the Quickstart and Connecting with the Milvus SDK.

pip install "pymilvus>=2.6,<2.7"

Connect

from pymilvus import MilvusClient, DataType client = MilvusClient( uri="https://<endpoint>:443", # endpoint + port from GetServiceAccess / `dodil vbase db use` token="<IAM access token>", # your IAM service-account token IS your Milvus token db_name="<db_name>", # the allocated database )

Create the collection (defer the index)

For a large load it is often fastest to insert first and build the index afterward, so the index is built once over the full dataset rather than incrementally. Create the collection with a schema but no index yet.

schema = client.create_schema(auto_id=False, enable_dynamic_field=True) schema.add_field("id", DataType.INT64, is_primary=True) schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=768) schema.add_field("tenant", DataType.VARCHAR, max_length=64) client.create_collection(collection_name="catalog", schema=schema)

Insert in batches

Stream your rows in fixed-size batches rather than one giant call — this keeps request payloads bounded and gives you natural checkpoints. A few thousand rows per batch is a good starting point; tune to your vector dimension and row size.

def embed(text: str) -> list[float]: """Replace with your embedding model — must return a 768-dim vector.""" ... def source_rows(): """Replace with your data source (a file, a queue, a DB cursor, ...).""" for i in range(1_000_000): yield {"id": i, "embedding": embed(f"item {i}"), "tenant": "acme"} BATCH = 2_000 batch = [] for row in source_rows(): batch.append(row) if len(batch) >= BATCH: client.insert(collection_name="catalog", data=batch) batch.clear() if batch: # flush the final partial batch client.insert(collection_name="catalog", data=batch)

Keep batches uniform and retry a failed batch rather than the whole load. Each insert is durable on success, so you can checkpoint on the last id you sent.

Choose an index

Build the index once after the bulk load. Pick the type by your scale and recall needs.

IndexSearchStrengthTrade-off
FLATExactMaximum recall; the correctness baselineLatency grows fastest as data grows
HNSWApproximate (graph)Strong latency/recall defaultHigher memory use
IVF_FLATApproximate (inverted file)Scales well with tunable qualityNeeds nprobe tuning for best recall
IVF_SQ8IVF + scalar quantizationMuch lower memory at large scaleSome accuracy loss from quantization

Rules of thumb:

  • Start with HNSW for low-latency, high-recall search when the dataset fits comfortably in memory.
  • Move to IVF_FLAT when you want a tunable quality/cost knob (nlist at build, nprobe at query) at larger scale.
  • Choose IVF_SQ8 when memory is the binding constraint and a small recall hit is acceptable.
  • Keep FLAT for small collections or as a recall baseline to measure the others against.

Choose a metric

MetricBetter scoreUse when
L2SmallerYou care about absolute (Euclidean) distance.
IPLargerBoth magnitude and direction matter (inner product).
COSINELargerAngular similarity matters; common for normalized text embeddings.

The metric must match how you intend to query — set it at index time and use the same metric in search. For most normalized text embeddings, COSINE is the right default.

index_params = client.prepare_index_params() index_params.add_index( field_name="embedding", index_type="IVF_FLAT", metric_type="COSINE", params={"nlist": 1024}, # more lists = finer partitioning; tune with nprobe at query time ) client.create_index(collection_name="catalog", index_params=index_params)

Load the collection

Load once the index is built. Search requests fail until the collection is loaded.

client.load_collection(collection_name="catalog")

At query time, IVF_* indexes take an nprobe that trades recall for latency:

results = client.search( collection_name="catalog", data=[embed("a query")], anns_field="embedding", limit=10, search_params={"metric_type": "COSINE", "params": {"nprobe": 16}}, output_fields=["tenant"], )

Organize with partitions

Partitions split a collection into named segments. When your data has a natural grouping — a tenant, a date range, a region — partitioning lets you restrict a search to the relevant slice, which cuts both latency and read cost.

client.create_partition(collection_name="catalog", partition_name="acme") client.create_partition(collection_name="catalog", partition_name="globex") # Insert directly into a partition. client.insert( collection_name="catalog", data=[{"id": 9001, "embedding": embed("acme item"), "tenant": "acme"}], partition_name="acme", ) # Search only that partition — fewer vectors scanned, lower read cost. results = client.search( collection_name="catalog", data=[embed("a query")], anns_field="embedding", limit=10, search_params={"metric_type": "COSINE", "params": {"nprobe": 16}}, partition_names=["acme"], )

Partitions are an organizational tool, not a security boundary. Tenant isolation comes from your IAM scopes and separate VBase databases, not from partition names.

Notes

  • Metering. Every batched insert is metered as VectorWrite and the stored vectors count toward VectorStorage; each search is metered as VectorRead. Searching a single partition reads fewer vectors than scanning the whole collection. All usage is scoped to your organization’s quota.
  • Build the index after the bulk load so it is built once over the full dataset. For incremental, always-on ingestion, build the index up front instead and accept incremental index maintenance.
  • What VBase manages. The Milvus service, scaling, users, and RBAC. You operate at the collection/partition/index/search level with your IAM token.
  • Reference. Full index parameters, segment/flush behavior, and bulk-import options are in the Milvus documentation .

See also