Skip to Content
We are live but in Staging 🎉

IVF_FLAT

IVF_FLAT (Inverted File + Flat) is an index for dense, floating‑point vectors. It improves search latency on large datasets by clustering vectors into partitions and only scanning the most relevant partitions at query time.

If you’re already happy with brute‑force accuracy but need faster queries at scale, IVF_FLAT is a great first “approximate” index to try.

When to use it

Use IVF_FLAT when:

  • Your collection is large (hundreds of thousands to billions of vectors).
  • You want faster search while keeping high accuracy.
  • You can afford a bit more memory than compressed indexes (because vectors are stored as-is inside each partition).

Avoid IVF_FLAT when:

  • Your dataset is small (a flat scan is often simpler and fast enough).
  • You need maximum recall without tuning (use a FLAT index).

How it works

IVF_FLAT has two layers:

  1. IVF (Inverted File): Milvus clusters your vectors using k‑means and creates nlist partitions. Each partition has a centroid.
  2. FLAT inside each partition: Vectors inside a partition are stored in their original form (no compression), so distance calculations remain precise for the candidates you scan.

At search time, the query vector is compared to partition centroids, then Milvus searches only the top nprobe partitions for candidates.

  • Higher nprobe → better recall (more partitions scanned) but slower queries.
  • Lower nprobe → faster queries but potentially lower recall.

Key parameters

Build-time: nlist

nlist controls how many partitions you create.

  • Bigger nlist → more partitions → fewer vectors per partition → faster scans per partition, but slower index build and more tuning required.
  • Smaller nlist → fewer partitions → more vectors per partition → closer to a flat scan.

Query-time: nprobe

nprobe controls how many partitions you scan during search.

  • Bigger nprobe → higher recall, higher latency.
  • Smaller nprobe → lower latency, lower recall.

Create an IVF_FLAT index

Assuming you already have a vbase connection (see the Connect section in the docs), you can create an index on a vector field.

from dodil import Client from dodil.vbase import VBaseConfig c = Client( service_account_id="...", service_account_secret="...", ) vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) ) # Build an IVF_FLAT index on your dense vector field vbase.create_index( collection_name="my_collection", field_name="embedding", index_name="embedding_ivf_flat", index_type="IVF_FLAT", metric_type="COSINE", # or "L2", "IP" params={ "nlist": 128, }, )

Notes:

  • metric_type must match how you plan to measure similarity.
    • COSINE is common for normalized embeddings.
    • L2 (Euclidean) is common for raw float vectors.
    • IP (inner product) is useful for some ranking setups.
  • If you change the index type or metric, you’ll typically rebuild the index.

Search with IVF_FLAT

When searching, tune nprobe to balance latency vs recall.

results = vbase.search( collection_name="my_collection", vector_field="embedding", data=[[0.12, 0.98, 0.33, 0.01]], limit=10, search_params={ "params": { "nprobe": 8, } }, ) for hit in results[0]: print(hit["id"], hit["distance"], hit.get("payload"))

Tuning recommendations

These are practical starting points (you’ll still want to benchmark with your data):

  • nlist: typically 32 → 4096 depending on dataset size.
    • If your collection is in the low millions, start around nlist=128 or 256.
    • If your collection is very large, increase nlist gradually and re-benchmark.
  • nprobe: start around 8 and adjust.
    • If recall is low, increase nprobe.
    • If latency is high, reduce nprobe or revisit nlist.

A simple way to tune:

  1. Fix nlist.
  2. Sweep nprobe (e.g., 1, 4, 8, 16, 32 …) until recall is acceptable.
  3. If you can’t reach good recall without high latency, try a different nlist.

What to expect

  • IVF_FLAT is usually much faster than a flat scan on large datasets.
  • Accuracy can be very close to FLAT when nprobe is sufficiently high.
  • The best settings depend heavily on:
    • dataset size
    • embedding dimension
    • metric type
    • query distribution

If you want even faster search with smaller memory footprints, explore compressed indexes (like IVF_PQ / IVF_SQ8) after you’re comfortable with IVF_FLAT.

Last updated on