Skip to Content
We are live but in Staging 🎉

IVF_PQ index

IVF_PQ is an approximate nearest neighbor (ANN) index that combines two ideas:

  • IVF (Inverted File): splits your vector space into clusters so queries only scan a subset of the data.
  • PQ (Product Quantization): compresses vectors so the index uses far less memory while staying reasonably accurate.

In Dodil VBase (Milvus-backed), IVF_PQ is a practical choice when you have large collections and want a good balance between cost (RAM) and search speed.

When should I use IVF_PQ?

Use IVF_PQ when:

  • Your collection is large (hundreds of thousands to billions of vectors).
  • You want faster searches than brute-force (FLAT), with lower memory usage.
  • You can accept slightly lower recall compared to exact search.

Avoid IVF_PQ when:

  • You need exact results (use FLAT).
  • Your collection is small (index overhead may not be worth it).
  • Your vectors are already heavily compressed or low-dimensional (the gain may be limited).

How it works (simple mental model)

Think of your dataset as a giant library:

  1. IVF creates shelves (nlist shelves). When you search, you don’t scan the whole library—you scan a few shelves.
  2. PQ compresses each book into a short code so you can scan shelves quickly without storing full text.

During search you control how many shelves to open using nprobe.

  • Higher nprobe → better recall, slower search.
  • Lower nprobe → faster search, lower recall.

Build an IVF_PQ index

Below is an example using the Dodil Python SDK.

from dodil import Client from dodil.vbase import VBaseConfig c = Client( service_account_id="...", service_account_secret="...", ) vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) ) # Example: create an IVF_PQ index on the vector field "embedding" # (API names may differ slightly depending on your wrapper layer.) vbase.create_index( collection_name="my_collection", field_name="embedding", index_name="embedding_ivfpq", index_type="IVF_PQ", metric_type="COSINE", # or "L2", "IP" params={ "nlist": 1024, "m": 16, "nbits": 8, }, )

Notes on the build params

  • nlist (IVF): how many clusters to build.
  • m (PQ): how many sub-vectors each vector is split into.
  • nbits (PQ): how many bits to store each sub-vector code.

Important constraints:

  • m must be a divisor of your vector dimension D (e.g., if D=768, m can be 12, 16, 24, 32, 48, 64, 96, 128, …).
  • Larger nlist increases build time but can improve recall (more refined clustering).

Search with IVF_PQ

Once the index is created (and your data is inserted), you search with nprobe:

res = vbase.search( collection_name="my_collection", anns_field="embedding", data=[[0.12, 0.07, 0.31, ...]], limit=10, search_params={ "params": { "nprobe": 16, } }, ) for hit in res[0]: print(hit.id, hit.score)

Rule of thumb:

  • nprobe must be in [1, nlist].
  • Start small (e.g., 8–32) and increase until recall is good enough.

Tuning guide

These parameters let you trade latency, recall, index build time, and RAM usage.

nlist (build)

  • What it does: number of clusters.
  • Effect:
    • Higher nlist → better candidate pruning, potentially higher recall, higher build cost.
    • Lower nlist → faster build, but queries may need higher nprobe to compensate.

Practical starting points:

  • 100K–1M vectors: nlist 256–2048
  • 1M–50M vectors: nlist 1024–8192
  • 50M+ vectors: nlist 8192–65536 (only if you can afford the build cost)

m (build)

  • What it does: how many parts each vector is split into for PQ.
  • Effect:
    • Higher m → usually better accuracy, higher compute and memory.
    • Lower m → more compression, lower accuracy.

Practical starting points:

  • Common choice: m = D/2 (when it divides cleanly)
  • For D=768, try m=16 or m=32 first.

nbits (build)

  • What it does: bits per sub-vector codebook index.
  • Effect:
    • Higher nbits → more accurate compression, larger codes.
    • Lower nbits → more compression, lower accuracy.

Practical starting points:

  • Default: nbits=8
  • If you need more compression: try 4–6
  • If you need more quality: try 10–12 (watch memory)
  • What it does: how many IVF clusters to scan.
  • Effect:
    • Higher nprobe → higher recall, higher latency.
    • Lower nprobe → lower latency, lower recall.

Practical starting points:

  • Latency-first: nprobe 8–16
  • Balanced: nprobe 16–64
  • Recall-first: nprobe 64+ (only if it still meets your SLA)

Common gotchas

  • Low recall after enabling IVF_PQ

    • Increase nprobe first.
    • If that’s still not enough, consider increasing nlist (requires rebuild).
    • Consider a larger m or nbits if PQ compression is too aggressive.
  • Index builds slowly

    • Reduce nlist (trade recall), or build during off-peak.
  • m error / invalid params

    • Ensure m divides your vector dimension exactly.

If you just want a reasonable default to start with:

  • nlist: 1024
  • m: 16 (or 32 for 768-dim vectors)
  • nbits: 8
  • nprobe: 16

Then benchmark and tune from there.


Next: explore other index types (e.g., IVF_FLAT, IVF_SQ8, HNSW) to match your latency/recall/memory goals.

Last updated on