IVF_FLAT
IVF_FLAT (Inverted File + Flat) is an index for dense, floating‑point vectors. It improves search latency on large datasets by clustering vectors into partitions and only scanning the most relevant partitions at query time.
If you’re already happy with brute‑force accuracy but need faster queries at scale, IVF_FLAT is a great first “approximate” index to try.
When to use it
Use IVF_FLAT when:
- Your collection is large (hundreds of thousands to billions of vectors).
- You want faster search while keeping high accuracy.
- You can afford a bit more memory than compressed indexes (because vectors are stored as-is inside each partition).
Avoid IVF_FLAT when:
- Your dataset is small (a flat scan is often simpler and fast enough).
- You need maximum recall without tuning (use a FLAT index).
How it works
IVF_FLAT has two layers:
- IVF (Inverted File): Milvus clusters your vectors using k‑means and creates
nlistpartitions. Each partition has a centroid. - FLAT inside each partition: Vectors inside a partition are stored in their original form (no compression), so distance calculations remain precise for the candidates you scan.
At search time, the query vector is compared to partition centroids, then Milvus searches only the top nprobe partitions for candidates.
- Higher
nprobe→ better recall (more partitions scanned) but slower queries. - Lower
nprobe→ faster queries but potentially lower recall.
Key parameters
Build-time: nlist
nlist controls how many partitions you create.
- Bigger
nlist→ more partitions → fewer vectors per partition → faster scans per partition, but slower index build and more tuning required. - Smaller
nlist→ fewer partitions → more vectors per partition → closer to a flat scan.
Query-time: nprobe
nprobe controls how many partitions you scan during search.
- Bigger
nprobe→ higher recall, higher latency. - Smaller
nprobe→ lower latency, lower recall.
Create an IVF_FLAT index
Assuming you already have a vbase connection (see the Connect section in the docs), you can create an index on a vector field.
from dodil import Client
from dodil.vbase import VBaseConfig
c = Client(
service_account_id="...",
service_account_secret="...",
)
vbase = c.vbase.connect(
VBaseConfig(
host="vbase-db-<id>.infra.dodil.cloud",
port=443,
scheme="https",
db_name="db_<id>",
)
)
# Build an IVF_FLAT index on your dense vector field
vbase.create_index(
collection_name="my_collection",
field_name="embedding",
index_name="embedding_ivf_flat",
index_type="IVF_FLAT",
metric_type="COSINE", # or "L2", "IP"
params={
"nlist": 128,
},
)Notes:
metric_typemust match how you plan to measure similarity.COSINEis common for normalized embeddings.L2(Euclidean) is common for raw float vectors.IP(inner product) is useful for some ranking setups.
- If you change the index type or metric, you’ll typically rebuild the index.
Search with IVF_FLAT
When searching, tune nprobe to balance latency vs recall.
results = vbase.search(
collection_name="my_collection",
vector_field="embedding",
data=[[0.12, 0.98, 0.33, 0.01]],
limit=10,
search_params={
"params": {
"nprobe": 8,
}
},
)
for hit in results[0]:
print(hit["id"], hit["distance"], hit.get("payload"))Tuning recommendations
These are practical starting points (you’ll still want to benchmark with your data):
nlist: typically 32 → 4096 depending on dataset size.- If your collection is in the low millions, start around
nlist=128or256. - If your collection is very large, increase
nlistgradually and re-benchmark.
- If your collection is in the low millions, start around
nprobe: start around8and adjust.- If recall is low, increase
nprobe. - If latency is high, reduce
nprobeor revisitnlist.
- If recall is low, increase
A simple way to tune:
- Fix
nlist. - Sweep
nprobe(e.g., 1, 4, 8, 16, 32 …) until recall is acceptable. - If you can’t reach good recall without high latency, try a different
nlist.
What to expect
IVF_FLATis usually much faster than a flat scan on large datasets.- Accuracy can be very close to FLAT when
nprobeis sufficiently high. - The best settings depend heavily on:
- dataset size
- embedding dimension
- metric type
- query distribution
If you want even faster search with smaller memory footprints, explore compressed indexes (like IVF_PQ / IVF_SQ8) after you’re comfortable with IVF_FLAT.