IVF_RABITQ
IVF_RABITQ is a vector index that combines IVF (Inverted File) with RaBitQ binary quantization.
- Why it exists: reduce memory usage dramatically while keeping good recall.
- Compression: RaBitQ quantizes FP32 vectors into a compact binary form (up to 1:32 compression).
- Optional refinement: you can store a higher-precision “refine” representation to improve recall, at the cost of extra storage.
This index is a great fit when you want an IVF-style index but you’re memory-constrained, and you want something stronger than classic quantization options.
How it works
IVF (coarse filtering)
IVF splits the vector space into nlist clusters (via k-means). At query time, Milvus searches only a subset of those clusters (nprobe) instead of scanning everything.
RaBitQ (binary quantization)
Within each IVF cluster, RaBitQ stores a binary representation of vectors and uses very fast bit operations (popcount) to score candidates efficiently.
Optional refinement
If refinement is enabled, the search first finds candidates using the compact RaBitQ representation, then re-scores a larger candidate pool using a more accurate refine format.
When to use
Use IVF_RABITQ when you want:
- Much smaller memory footprint (large datasets, limited RAM)
- Good recall with speed (especially with modern CPUs)
- A tunable accuracy/performance trade-off using
nprobe,rbq_query_bits, and optional refinement
Create an IVF_RABITQ index (Dodil)
Below is a typical setup. The exact method names may differ slightly depending on your Dodil SDK version, but the parameters and ideas are the same.
from dodil import Client
from dodil.vbase import VBaseConfig
# Python 3.10+
c = Client(
service_account_id="...",
service_account_secret="...",
)
vbase = c.vbase.connect(
VBaseConfig(
host="vbase-db-<id>.infra.dodil.cloud",
port=443,
scheme="https",
db_name="db_<id>",
)
)
# Example: create IVF_RABITQ index for a vector field
vbase.create_index(
collection_name="my_collection",
field_name="embedding",
index_name="embedding_idx",
index_type="IVF_RABITQ",
metric_type="L2", # or COSINE / IP
params={
"nlist": 1024,
"refine": True,
"refine_type": "SQ8",
},
)Build parameters
| Parameter | What it does | Range / Type | Practical notes |
|---|---|---|---|
nlist | Number of IVF clusters | int [1..65536] (default 128) | Higher = smaller clusters = better recall, slower build. Common range: 32–4096. |
refine | Enables refinement | bool (default false) | Turn on if you need higher recall (often 0.9+), at the cost of extra storage and build time. |
refine_type | Precision used for refinement | SQ6, SQ8, FP16, BF16, FP32 | Listed from smaller/faster → larger/slower but higher recall. SQ8 is a good starting point. |
Search with IVF_RABITQ
results = vbase.search(
collection_name="my_collection",
anns_field="embedding",
data=[[0.1, 0.2, 0.3, 0.4]],
limit=10,
search_params={
"params": {
"nprobe": 128,
"rbq_query_bits": 0,
"refine_k": 1,
}
},
)Search parameters
| Parameter | What it does | Range / Type | Practical notes |
|---|---|---|---|
nprobe | How many IVF clusters to search | int [1..nlist] (default 8) | Higher usually improves recall but increases latency. Scale it with nlist. |
rbq_query_bits | Extra query-side scalar quantization | int [0..8] (default 0) | 0 = best recall, slowest. Try 0, 8, 6; 6 is often fastest with similar recall. |
refine_k | Candidate pool multiplier for refinement | float [1..∞) (default 1) | Higher = more candidates refined = higher recall, lower QPS. Try 1–5. |
Performance notes
IVF_RABITQbenefits heavily from CPUs with fast popcount instructions (e.g., Intel Ice Lake+ or AMD Zen 4+).- If you enable refinement, you’ll typically get better recall, but you should expect more storage and slower search depending on
refine_typeandrefine_k.
Quick tuning recipe
- Start with
nlist=1024,nprobe=64–128. - Keep
rbq_query_bits=0for maximum recall; try6or8if you need more speed. - If recall isn’t good enough, enable
refine=truewithrefine_type="SQ8". - Increase
refine_kgradually (1 → 2 → 3 …) until you hit your recall target.