IVF_PQ index

IVF_PQ is an approximate nearest neighbor (ANN) index that combines two ideas:

IVF (Inverted File): splits your vector space into clusters so queries only scan a subset of the data.
PQ (Product Quantization): compresses vectors so the index uses far less memory while staying reasonably accurate.

In Dodil VBase (Milvus-backed), IVF_PQ is a practical choice when you have large collections and want a good balance between cost (RAM) and search speed.

When should I use IVF_PQ?

Use IVF_PQ when:

Your collection is large (hundreds of thousands to billions of vectors).
You want faster searches than brute-force (FLAT), with lower memory usage.
You can accept slightly lower recall compared to exact search.

Avoid IVF_PQ when:

You need exact results (use FLAT).
Your collection is small (index overhead may not be worth it).
Your vectors are already heavily compressed or low-dimensional (the gain may be limited).

How it works (simple mental model)

Think of your dataset as a giant library:

IVF creates shelves (nlist shelves). When you search, you don’t scan the whole library—you scan a few shelves.
PQ compresses each book into a short code so you can scan shelves quickly without storing full text.

During search you control how many shelves to open using nprobe.

Higher nprobe → better recall, slower search.
Lower nprobe → faster search, lower recall.

Build an IVF_PQ index

Below is an example using the Dodil Python SDK.


from dodil import Client
from dodil.vbase import VBaseConfig
 
c = Client(
    service_account_id="...",
    service_account_secret="...",
)
 
vbase = c.vbase.connect(
    VBaseConfig(
        host="vbase-db-<id>.infra.dodil.cloud",
        port=443,
        scheme="https",
        db_name="db_<id>",
    )
)
 
# Example: create an IVF_PQ index on the vector field "embedding"
# (API names may differ slightly depending on your wrapper layer.)
vbase.create_index(
    collection_name="my_collection",
    field_name="embedding",
    index_name="embedding_ivfpq",
    index_type="IVF_PQ",
    metric_type="COSINE",  # or "L2", "IP"
    params={
        "nlist": 1024,
        "m": 16,
        "nbits": 8,
    },
)

Notes on the build params

nlist (IVF): how many clusters to build.
m (PQ): how many sub-vectors each vector is split into.
nbits (PQ): how many bits to store each sub-vector code.

Important constraints:

m must be a divisor of your vector dimension D (e.g., if D=768, m can be 12, 16, 24, 32, 48, 64, 96, 128, …).
Larger nlist increases build time but can improve recall (more refined clustering).

Search with IVF_PQ

Once the index is created (and your data is inserted), you search with nprobe:


res = vbase.search(
    collection_name="my_collection",
    anns_field="embedding",
    data=[[0.12, 0.07, 0.31, ...]],
    limit=10,
    search_params={
        "params": {
            "nprobe": 16,
        }
    },
)
 
for hit in res[0]:
    print(hit.id, hit.score)

Rule of thumb:

nprobe must be in [1, nlist].
Start small (e.g., 8–32) and increase until recall is good enough.

Tuning guide

These parameters let you trade latency, recall, index build time, and RAM usage.

nlist (build)

What it does: number of clusters.
Effect:
- Higher nlist → better candidate pruning, potentially higher recall, higher build cost.
- Lower nlist → faster build, but queries may need higher nprobe to compensate.

Practical starting points:

100K–1M vectors: nlist 256–2048
1M–50M vectors: nlist 1024–8192
50M+ vectors: nlist 8192–65536 (only if you can afford the build cost)

m (build)

What it does: how many parts each vector is split into for PQ.
Effect:
- Higher m → usually better accuracy, higher compute and memory.
- Lower m → more compression, lower accuracy.

Practical starting points:

Common choice: m = D/2 (when it divides cleanly)
For D=768, try m=16 or m=32 first.

nbits (build)

What it does: bits per sub-vector codebook index.
Effect:
- Higher nbits → more accurate compression, larger codes.
- Lower nbits → more compression, lower accuracy.

Practical starting points:

Default: nbits=8
If you need more compression: try 4–6
If you need more quality: try 10–12 (watch memory)

nprobe (search)

What it does: how many IVF clusters to scan.
Effect:
- Higher nprobe → higher recall, higher latency.
- Lower nprobe → lower latency, lower recall.

Practical starting points:

Latency-first: nprobe 8–16
Balanced: nprobe 16–64
Recall-first: nprobe 64+ (only if it still meets your SLA)

Common gotchas

Low recall after enabling IVF_PQ
- Increase nprobe first.
- If that’s still not enough, consider increasing nlist (requires rebuild).
- Consider a larger m or nbits if PQ compression is too aggressive.
Index builds slowly
- Reduce nlist (trade recall), or build during off-peak.
m error / invalid params
- Ensure m divides your vector dimension exactly.

Recommended defaults

If you just want a reasonable default to start with:

nlist: 1024
m: 16 (or 32 for 768-dim vectors)
nbits: 8
nprobe: 16

Then benchmark and tune from there.

Next: explore other index types (e.g., IVF_FLAT, IVF_SQ8, HNSW) to match your latency/recall/memory goals.