IVF_PQ index
IVF_PQ is an approximate nearest neighbor (ANN) index that combines two ideas:
- IVF (Inverted File): splits your vector space into clusters so queries only scan a subset of the data.
- PQ (Product Quantization): compresses vectors so the index uses far less memory while staying reasonably accurate.
In Dodil VBase (Milvus-backed), IVF_PQ is a practical choice when you have large collections and want a good balance between cost (RAM) and search speed.
When should I use IVF_PQ?
Use IVF_PQ when:
- Your collection is large (hundreds of thousands to billions of vectors).
- You want faster searches than brute-force (FLAT), with lower memory usage.
- You can accept slightly lower recall compared to exact search.
Avoid IVF_PQ when:
- You need exact results (use
FLAT). - Your collection is small (index overhead may not be worth it).
- Your vectors are already heavily compressed or low-dimensional (the gain may be limited).
How it works (simple mental model)
Think of your dataset as a giant library:
- IVF creates shelves (
nlistshelves). When you search, you don’t scan the whole library—you scan a few shelves. - PQ compresses each book into a short code so you can scan shelves quickly without storing full text.
During search you control how many shelves to open using nprobe.
- Higher
nprobe→ better recall, slower search. - Lower
nprobe→ faster search, lower recall.
Build an IVF_PQ index
Below is an example using the Dodil Python SDK.
from dodil import Client
from dodil.vbase import VBaseConfig
c = Client(
service_account_id="...",
service_account_secret="...",
)
vbase = c.vbase.connect(
VBaseConfig(
host="vbase-db-<id>.infra.dodil.cloud",
port=443,
scheme="https",
db_name="db_<id>",
)
)
# Example: create an IVF_PQ index on the vector field "embedding"
# (API names may differ slightly depending on your wrapper layer.)
vbase.create_index(
collection_name="my_collection",
field_name="embedding",
index_name="embedding_ivfpq",
index_type="IVF_PQ",
metric_type="COSINE", # or "L2", "IP"
params={
"nlist": 1024,
"m": 16,
"nbits": 8,
},
)Notes on the build params
nlist(IVF): how many clusters to build.m(PQ): how many sub-vectors each vector is split into.nbits(PQ): how many bits to store each sub-vector code.
Important constraints:
mmust be a divisor of your vector dimensionD(e.g., ifD=768,mcan be 12, 16, 24, 32, 48, 64, 96, 128, …).- Larger
nlistincreases build time but can improve recall (more refined clustering).
Search with IVF_PQ
Once the index is created (and your data is inserted), you search with nprobe:
res = vbase.search(
collection_name="my_collection",
anns_field="embedding",
data=[[0.12, 0.07, 0.31, ...]],
limit=10,
search_params={
"params": {
"nprobe": 16,
}
},
)
for hit in res[0]:
print(hit.id, hit.score)Rule of thumb:
nprobemust be in [1, nlist].- Start small (e.g., 8–32) and increase until recall is good enough.
Tuning guide
These parameters let you trade latency, recall, index build time, and RAM usage.
nlist (build)
- What it does: number of clusters.
- Effect:
- Higher
nlist→ better candidate pruning, potentially higher recall, higher build cost. - Lower
nlist→ faster build, but queries may need highernprobeto compensate.
- Higher
Practical starting points:
- 100K–1M vectors:
nlist256–2048 - 1M–50M vectors:
nlist1024–8192 - 50M+ vectors:
nlist8192–65536 (only if you can afford the build cost)
m (build)
- What it does: how many parts each vector is split into for PQ.
- Effect:
- Higher
m→ usually better accuracy, higher compute and memory. - Lower
m→ more compression, lower accuracy.
- Higher
Practical starting points:
- Common choice:
m = D/2(when it divides cleanly) - For
D=768, trym=16orm=32first.
nbits (build)
- What it does: bits per sub-vector codebook index.
- Effect:
- Higher
nbits→ more accurate compression, larger codes. - Lower
nbits→ more compression, lower accuracy.
- Higher
Practical starting points:
- Default:
nbits=8 - If you need more compression: try 4–6
- If you need more quality: try 10–12 (watch memory)
nprobe (search)
- What it does: how many IVF clusters to scan.
- Effect:
- Higher
nprobe→ higher recall, higher latency. - Lower
nprobe→ lower latency, lower recall.
- Higher
Practical starting points:
- Latency-first:
nprobe8–16 - Balanced:
nprobe16–64 - Recall-first:
nprobe64+ (only if it still meets your SLA)
Common gotchas
-
Low recall after enabling IVF_PQ
- Increase
nprobefirst. - If that’s still not enough, consider increasing
nlist(requires rebuild). - Consider a larger
mornbitsif PQ compression is too aggressive.
- Increase
-
Index builds slowly
- Reduce
nlist(trade recall), or build during off-peak.
- Reduce
-
merror / invalid params- Ensure
mdivides your vector dimension exactly.
- Ensure
Recommended defaults
If you just want a reasonable default to start with:
nlist: 1024m: 16 (or 32 for 768-dim vectors)nbits: 8nprobe: 16
Then benchmark and tune from there.
Next: explore other index types (e.g., IVF_FLAT, IVF_SQ8, HNSW) to match your latency/recall/memory goals.