title: HNSW_PRQ description: An HNSW graph index with Product Residual Quantization (PRQ) compression for faster, lower-memory vector search.
HNSW_PRQ
HNSW_PRQ is a compressed vector index that combines:
- HNSW (Hierarchical Navigable Small World graphs) for fast approximate nearest-neighbor retrieval.
- PRQ (Product Residual Quantization) to shrink vector storage while keeping accuracy reasonable.
If your collections are getting large and memory is a bottleneck, HNSW_PRQ is a strong option: it typically uses far less memory than plain HNSW, while staying much faster than a brute-force scan.
When should I use HNSW_PRQ?
Use HNSW_PRQ when you want:
- Lower RAM / SSD footprint than HNSW, especially for high-dimensional embeddings.
- Good recall with predictable latency.
- The ability to trade accuracy vs memory vs speed using a small set of parameters.
Avoid it when:
- Your dataset is small enough that a simple index (or even FLAT) is already fast.
- You need the absolute highest accuracy and can afford larger memory usage.
How it works
HNSW (graph search)
HNSW builds a multi-layer graph where each vector is a node. During search, the query âwalksâ the graph to find good candidates quickly instead of comparing against every vector.
Two parameters mostly control HNSW behavior:
M: how many graph connections each node is allowed to keep.efConstruction: how wide the search is while building the graph.
PRQ (multi-stage compression)
PRQ compresses vectors in two steps:
- PQ (Product Quantization): splits the vector into
msub-vectors and replaces each sub-vector with the ID of its closest centroid (from a codebook). This gives big compression, but introduces approximation error. - RQ (Residual Quantization): measures the residual (the difference between the original vector and its PQ approximation), then quantizes that residual using additional codebooks.
The nrq parameter controls how many residual quantization steps are applied.
HNSW + PRQ together
With HNSW_PRQ:
- Vectors are stored in a compact PRQ representation.
- The HNSW graph is built on those compressed representations.
- Search traverses the graph to retrieve candidates quickly.
- Optional refinement: rerank candidates using higher-precision data for better final accuracy.
Refinement is controlled by:
refine: enable/disable reranking.refine_type: the precision level used for reranking.refine_k: how many extra candidates to rerank.
Build an HNSW_PRQ index (Dodil)
Assuming you already have a connected vbase client, you can create an index on your vector field.
# Example only â method names may vary slightly by SDK version.
vbase.create_index(
collection_name="my_collection",
field_name="embedding",
index_name="embedding_hnsw_prq",
index_type="HNSW_PRQ",
metric_type="COSINE", # COSINE | L2 | IP
params={
# HNSW
"M": 30,
"efConstruction": 360,
# PRQ
"m": 384,
"nbits": 8,
"nrq": 2,
# Optional refinement
"refine": True,
"refine_type": "SQ8",
# "refine_k" is used at search-time
},
)Build-time parameters
| Parameter | What it controls | Value range | Practical guidance |
|---|---|---|---|
M | Max connections per node in the HNSW graph | 2..2048 | Larger = higher recall, more memory and slower build/search. Common range: 5..100. |
efConstruction | Candidate pool size during graph construction | 1..int_max | Larger = better index quality, longer build time. Common range: 50..500. |
m | Number of sub-vectors used in PQ stage | 1..65536 | Must divide the vector dimension D. Higher can improve accuracy but increases compute. Often m â D/2 (and commonly within D/8..D). |
nbits | Bits per centroid ID in PQ codebooks | 1..24 | Higher = larger codebooks and better accuracy, but less compression. Common range: 1..16 (default is often 8). |
nrq | Number of residual quantization steps in RQ stage | 1..16 | Higher can improve reconstruction quality but increases size and compute. Start small (e.g., 1..3) and tune. |
refine | Enable reranking using higher precision | true/false | Turn on when you care about accuracy more than speed. |
refine_type | Precision used during refinement | SQ6, SQ8, BF16, FP16, FP32 | FP32 is most accurate but highest memory cost. SQ6/SQ8 are cheaper. BF16/FP16 are a good middle ground. |
Search with HNSW_PRQ
At query-time, HNSW_PRQ mainly exposes two tuning knobs:
efcontrols how wide the graph traversal is.refine_kcontrols how many extra candidates are reranked (only matters if refinement is enabled).
results = vbase.search(
collection_name="my_collection",
anns_field="embedding",
data=[query_embedding],
limit=10,
search_params={
"params": {
"ef": 64,
"refine_k": 2,
}
},
)Search-time parameters
| Parameter | What it controls | Value range | Practical guidance |
|---|---|---|---|
ef | How many nodes are explored during search | 1..int_max | Larger = higher recall, slower queries. A common starting point is ef â K and tuning upward (often up to 10ĂK). |
refine_k | Reranking âmagnificationâ factor | 1..float_max | If K=100 and refine_k=2, rerank ~200 candidates then return the best 100. Higher improves recall but costs more compute. |
Quick tuning recipe
- Start with
M=30,efConstruction=360,nbits=8,nrq=2. - Set
mso it divides your dimensionD(trym=D/2first). - For better recall:
- Increase
ef(query-time) first. - If still not enough, increase
Mand/orefConstruction(build-time).
- Increase
- If accuracy is still not enough, enable refinement:
refine=True,refine_type="BF16"or"FP16", then tunerefine_k.