Skip to Content
We are live but in Staging 🎉

IVF_RABITQ

IVF_RABITQ is a vector index that combines IVF (Inverted File) with RaBitQ binary quantization.

  • Why it exists: reduce memory usage dramatically while keeping good recall.
  • Compression: RaBitQ quantizes FP32 vectors into a compact binary form (up to 1:32 compression).
  • Optional refinement: you can store a higher-precision “refine” representation to improve recall, at the cost of extra storage.

This index is a great fit when you want an IVF-style index but you’re memory-constrained, and you want something stronger than classic quantization options.

How it works

IVF (coarse filtering)

IVF splits the vector space into nlist clusters (via k-means). At query time, Milvus searches only a subset of those clusters (nprobe) instead of scanning everything.

RaBitQ (binary quantization)

Within each IVF cluster, RaBitQ stores a binary representation of vectors and uses very fast bit operations (popcount) to score candidates efficiently.

Optional refinement

If refinement is enabled, the search first finds candidates using the compact RaBitQ representation, then re-scores a larger candidate pool using a more accurate refine format.

When to use

Use IVF_RABITQ when you want:

  • Much smaller memory footprint (large datasets, limited RAM)
  • Good recall with speed (especially with modern CPUs)
  • A tunable accuracy/performance trade-off using nprobe, rbq_query_bits, and optional refinement

Create an IVF_RABITQ index (Dodil)

Below is a typical setup. The exact method names may differ slightly depending on your Dodil SDK version, but the parameters and ideas are the same.

from dodil import Client from dodil.vbase import VBaseConfig # Python 3.10+ c = Client( service_account_id="...", service_account_secret="...", ) vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) ) # Example: create IVF_RABITQ index for a vector field vbase.create_index( collection_name="my_collection", field_name="embedding", index_name="embedding_idx", index_type="IVF_RABITQ", metric_type="L2", # or COSINE / IP params={ "nlist": 1024, "refine": True, "refine_type": "SQ8", }, )

Build parameters

ParameterWhat it doesRange / TypePractical notes
nlistNumber of IVF clustersint [1..65536] (default 128)Higher = smaller clusters = better recall, slower build. Common range: 32–4096.
refineEnables refinementbool (default false)Turn on if you need higher recall (often 0.9+), at the cost of extra storage and build time.
refine_typePrecision used for refinementSQ6, SQ8, FP16, BF16, FP32Listed from smaller/fasterlarger/slower but higher recall. SQ8 is a good starting point.

Search with IVF_RABITQ

results = vbase.search( collection_name="my_collection", anns_field="embedding", data=[[0.1, 0.2, 0.3, 0.4]], limit=10, search_params={ "params": { "nprobe": 128, "rbq_query_bits": 0, "refine_k": 1, } }, )

Search parameters

ParameterWhat it doesRange / TypePractical notes
nprobeHow many IVF clusters to searchint [1..nlist] (default 8)Higher usually improves recall but increases latency. Scale it with nlist.
rbq_query_bitsExtra query-side scalar quantizationint [0..8] (default 0)0 = best recall, slowest. Try 0, 8, 6; 6 is often fastest with similar recall.
refine_kCandidate pool multiplier for refinementfloat [1..∞) (default 1)Higher = more candidates refined = higher recall, lower QPS. Try 1–5.

Performance notes

  • IVF_RABITQ benefits heavily from CPUs with fast popcount instructions (e.g., Intel Ice Lake+ or AMD Zen 4+).
  • If you enable refinement, you’ll typically get better recall, but you should expect more storage and slower search depending on refine_type and refine_k.

Quick tuning recipe

  1. Start with nlist=1024, nprobe=64–128.
  2. Keep rbq_query_bits=0 for maximum recall; try 6 or 8 if you need more speed.
  3. If recall isn’t good enough, enable refine=true with refine_type="SQ8".
  4. Increase refine_k gradually (1 → 2 → 3 …) until you hit your recall target.
Last updated on