title: HNSW_PRQ description: An HNSW graph index with Product Residual Quantization (PRQ) compression for faster, lower-memory vector search.

HNSW_PRQ

HNSW_PRQ is a compressed vector index that combines:

HNSW (Hierarchical Navigable Small World graphs) for fast approximate nearest-neighbor retrieval.
PRQ (Product Residual Quantization) to shrink vector storage while keeping accuracy reasonable.

If your collections are getting large and memory is a bottleneck, HNSW_PRQ is a strong option: it typically uses far less memory than plain HNSW, while staying much faster than a brute-force scan.

When should I use HNSW_PRQ?

Use HNSW_PRQ when you want:

Lower RAM / SSD footprint than HNSW, especially for high-dimensional embeddings.
Good recall with predictable latency.
The ability to trade accuracy vs memory vs speed using a small set of parameters.

Avoid it when:

Your dataset is small enough that a simple index (or even FLAT) is already fast.
You need the absolute highest accuracy and can afford larger memory usage.

How it works

HNSW (graph search)

HNSW builds a multi-layer graph where each vector is a node. During search, the query “walks” the graph to find good candidates quickly instead of comparing against every vector.

Two parameters mostly control HNSW behavior:

M: how many graph connections each node is allowed to keep.
efConstruction: how wide the search is while building the graph.

PRQ (multi-stage compression)

PRQ compresses vectors in two steps:

PQ (Product Quantization): splits the vector into m sub-vectors and replaces each sub-vector with the ID of its closest centroid (from a codebook). This gives big compression, but introduces approximation error.
RQ (Residual Quantization): measures the residual (the difference between the original vector and its PQ approximation), then quantizes that residual using additional codebooks.

The nrq parameter controls how many residual quantization steps are applied.

HNSW + PRQ together

With HNSW_PRQ:

Vectors are stored in a compact PRQ representation.
The HNSW graph is built on those compressed representations.
Search traverses the graph to retrieve candidates quickly.
Optional refinement: rerank candidates using higher-precision data for better final accuracy.

Refinement is controlled by:

refine: enable/disable reranking.
refine_type: the precision level used for reranking.
refine_k: how many extra candidates to rerank.

Build an HNSW_PRQ index (Dodil)

Assuming you already have a connected vbase client, you can create an index on your vector field.


# Example only — method names may vary slightly by SDK version.
vbase.create_index(
    collection_name="my_collection",
    field_name="embedding",
    index_name="embedding_hnsw_prq",
    index_type="HNSW_PRQ",
    metric_type="COSINE",  # COSINE | L2 | IP
    params={
        # HNSW
        "M": 30,
        "efConstruction": 360,
 
        # PRQ
        "m": 384,
        "nbits": 8,
        "nrq": 2,
 
        # Optional refinement
        "refine": True,
        "refine_type": "SQ8",
        # "refine_k" is used at search-time
    },
)

Build-time parameters

Parameter	What it controls	Value range	Practical guidance
`M`	Max connections per node in the HNSW graph	`2..2048`	Larger = higher recall, more memory and slower build/search. Common range: `5..100`.
`efConstruction`	Candidate pool size during graph construction	`1..int_max`	Larger = better index quality, longer build time. Common range: `50..500`.
`m`	Number of sub-vectors used in PQ stage	`1..65536`	Must divide the vector dimension `D`. Higher can improve accuracy but increases compute. Often `m ≈ D/2` (and commonly within `D/8..D`).
`nbits`	Bits per centroid ID in PQ codebooks	`1..24`	Higher = larger codebooks and better accuracy, but less compression. Common range: `1..16` (default is often `8`).
`nrq`	Number of residual quantization steps in RQ stage	`1..16`	Higher can improve reconstruction quality but increases size and compute. Start small (e.g., `1..3`) and tune.
`refine`	Enable reranking using higher precision	`true/false`	Turn on when you care about accuracy more than speed.
`refine_type`	Precision used during refinement	`SQ6`, `SQ8`, `BF16`, `FP16`, `FP32`	`FP32` is most accurate but highest memory cost. `SQ6/SQ8` are cheaper. `BF16/FP16` are a good middle ground.

Search with HNSW_PRQ

At query-time, HNSW_PRQ mainly exposes two tuning knobs:

ef controls how wide the graph traversal is.
refine_k controls how many extra candidates are reranked (only matters if refinement is enabled).


results = vbase.search(
    collection_name="my_collection",
    anns_field="embedding",
    data=[query_embedding],
    limit=10,
    search_params={
        "params": {
            "ef": 64,
            "refine_k": 2,
        }
    },
)

Search-time parameters

Parameter	What it controls	Value range	Practical guidance
`ef`	How many nodes are explored during search	`1..int_max`	Larger = higher recall, slower queries. A common starting point is `ef ≈ K` and tuning upward (often up to `10×K`).
`refine_k`	Reranking “magnification” factor	`1..float_max`	If `K=100` and `refine_k=2`, rerank ~200 candidates then return the best 100. Higher improves recall but costs more compute.

Quick tuning recipe

Start with M=30, efConstruction=360, nbits=8, nrq=2.
Set m so it divides your dimension D (try m=D/2 first).
For better recall:
- Increase ef (query-time) first.
- If still not enough, increase M and/or efConstruction (build-time).
If accuracy is still not enough, enable refinement:
- refine=True, refine_type="BF16" or "FP16", then tune refine_k.