IVF_RABITQ

IVF_RABITQ is a vector index that combines IVF (Inverted File) with RaBitQ binary quantization.

Why it exists: reduce memory usage dramatically while keeping good recall.
Compression: RaBitQ quantizes FP32 vectors into a compact binary form (up to 1:32 compression).
Optional refinement: you can store a higher-precision “refine” representation to improve recall, at the cost of extra storage.

This index is a great fit when you want an IVF-style index but you’re memory-constrained, and you want something stronger than classic quantization options.

How it works

IVF (coarse filtering)

IVF splits the vector space into nlist clusters (via k-means). At query time, Milvus searches only a subset of those clusters (nprobe) instead of scanning everything.

RaBitQ (binary quantization)

Within each IVF cluster, RaBitQ stores a binary representation of vectors and uses very fast bit operations (popcount) to score candidates efficiently.

Optional refinement

If refinement is enabled, the search first finds candidates using the compact RaBitQ representation, then re-scores a larger candidate pool using a more accurate refine format.

When to use

Use IVF_RABITQ when you want:

Much smaller memory footprint (large datasets, limited RAM)
Good recall with speed (especially with modern CPUs)
A tunable accuracy/performance trade-off using nprobe, rbq_query_bits, and optional refinement

Create an IVF_RABITQ index (Dodil)

Below is a typical setup. The exact method names may differ slightly depending on your Dodil SDK version, but the parameters and ideas are the same.


from dodil import Client
from dodil.vbase import VBaseConfig
 
# Python 3.10+
 
c = Client(
    service_account_id="...",
    service_account_secret="...",
)
 
vbase = c.vbase.connect(
    VBaseConfig(
        host="vbase-db-<id>.infra.dodil.cloud",
        port=443,
        scheme="https",
        db_name="db_<id>",
    )
)
 
# Example: create IVF_RABITQ index for a vector field
vbase.create_index(
    collection_name="my_collection",
    field_name="embedding",
    index_name="embedding_idx",
    index_type="IVF_RABITQ",
    metric_type="L2",  # or COSINE / IP
    params={
        "nlist": 1024,
        "refine": True,
        "refine_type": "SQ8",
    },
)

Build parameters

Parameter	What it does	Range / Type	Practical notes
`nlist`	Number of IVF clusters	int `[1..65536]` (default `128`)	Higher = smaller clusters = better recall, slower build. Common range: 32–4096.
`refine`	Enables refinement	bool (default `false`)	Turn on if you need higher recall (often 0.9+), at the cost of extra storage and build time.
`refine_type`	Precision used for refinement	`SQ6`, `SQ8`, `FP16`, `BF16`, `FP32`	Listed from smaller/faster → larger/slower but higher recall. `SQ8` is a good starting point.

Search with IVF_RABITQ


results = vbase.search(
    collection_name="my_collection",
    anns_field="embedding",
    data=[[0.1, 0.2, 0.3, 0.4]],
    limit=10,
    search_params={
        "params": {
            "nprobe": 128,
            "rbq_query_bits": 0,
            "refine_k": 1,
        }
    },
)

Search parameters

Parameter	What it does	Range / Type	Practical notes
`nprobe`	How many IVF clusters to search	int `[1..nlist]` (default `8`)	Higher usually improves recall but increases latency. Scale it with `nlist`.
`rbq_query_bits`	Extra query-side scalar quantization	int `[0..8]` (default `0`)	`0` = best recall, slowest. Try 0, 8, 6; 6 is often fastest with similar recall.
`refine_k`	Candidate pool multiplier for refinement	float `[1..∞)` (default `1`)	Higher = more candidates refined = higher recall, lower QPS. Try 1–5.

Performance notes

IVF_RABITQ benefits heavily from CPUs with fast popcount instructions (e.g., Intel Ice Lake+ or AMD Zen 4+).
If you enable refinement, you’ll typically get better recall, but you should expect more storage and slower search depending on refine_type and refine_k.

Quick tuning recipe

Start with nlist=1024, nprobe=64–128.
Keep rbq_query_bits=0 for maximum recall; try 6 or 8 if you need more speed.
If recall isn’t good enough, enable refine=true with refine_type="SQ8".
Increase refine_k gradually (1 → 2 → 3 …) until you hit your recall target.