Skip to Content
We are live but in Staging 🎉

HNSW_PRQ

HNSW_PRQ is a compressed vector index that combines:

  • HNSW (Hierarchical Navigable Small World graphs) for fast approximate nearest-neighbor retrieval.
  • PRQ (Product Residual Quantization) to shrink vector storage while keeping accuracy reasonable.

If your collections are getting large and memory is a bottleneck, HNSW_PRQ is a strong option: it typically uses far less memory than plain HNSW, while staying much faster than a brute-force scan.

When should I use HNSW_PRQ?

Use HNSW_PRQ when you want:

  • Lower RAM / SSD footprint than HNSW, especially for high-dimensional embeddings.
  • Good recall with predictable latency.
  • The ability to trade accuracy vs memory vs speed using a small set of parameters.

Avoid it when:

  • Your dataset is small enough that a simple index (or even FLAT) is already fast.
  • You need the absolute highest accuracy and can afford larger memory usage.

How it works

HNSW builds a multi-layer graph where each vector is a node. During search, the query “walks” the graph to find good candidates quickly instead of comparing against every vector.

Two parameters mostly control HNSW behavior:

  • M: how many graph connections each node is allowed to keep.
  • efConstruction: how wide the search is while building the graph.

PRQ (multi-stage compression)

PRQ compresses vectors in two steps:

  1. PQ (Product Quantization): splits the vector into m sub-vectors and replaces each sub-vector with the ID of its closest centroid (from a codebook). This gives big compression, but introduces approximation error.
  2. RQ (Residual Quantization): measures the residual (the difference between the original vector and its PQ approximation), then quantizes that residual using additional codebooks.

The nrq parameter controls how many residual quantization steps are applied.

HNSW + PRQ together

With HNSW_PRQ:

  1. Vectors are stored in a compact PRQ representation.
  2. The HNSW graph is built on those compressed representations.
  3. Search traverses the graph to retrieve candidates quickly.
  4. Optional refinement: rerank candidates using higher-precision data for better final accuracy.

Refinement is controlled by:

  • refine: enable/disable reranking.
  • refine_type: the precision level used for reranking.
  • refine_k: how many extra candidates to rerank.

Build an HNSW_PRQ index (Dodil)

Assuming you already have a connected vbase client, you can create an index on your vector field.

# Example only — method names may vary slightly by SDK version. vbase.create_index( collection_name="my_collection", field_name="embedding", index_name="embedding_hnsw_prq", index_type="HNSW_PRQ", metric_type="COSINE", # COSINE | L2 | IP params={ # HNSW "M": 30, "efConstruction": 360, # PRQ "m": 384, "nbits": 8, "nrq": 2, # Optional refinement "refine": True, "refine_type": "SQ8", # "refine_k" is used at search-time }, )

Build-time parameters

ParameterWhat it controlsValue rangePractical guidance
MMax connections per node in the HNSW graph2..2048Larger = higher recall, more memory and slower build/search. Common range: 5..100.
efConstructionCandidate pool size during graph construction1..int_maxLarger = better index quality, longer build time. Common range: 50..500.
mNumber of sub-vectors used in PQ stage1..65536Must divide the vector dimension D. Higher can improve accuracy but increases compute. Often m ≈ D/2 (and commonly within D/8..D).
nbitsBits per centroid ID in PQ codebooks1..24Higher = larger codebooks and better accuracy, but less compression. Common range: 1..16 (default is often 8).
nrqNumber of residual quantization steps in RQ stage1..16Higher can improve reconstruction quality but increases size and compute. Start small (e.g., 1..3) and tune.
refineEnable reranking using higher precisiontrue/falseTurn on when you care about accuracy more than speed.
refine_typePrecision used during refinementSQ6, SQ8, BF16, FP16, FP32FP32 is most accurate but highest memory cost. SQ6/SQ8 are cheaper. BF16/FP16 are a good middle ground.

Search with HNSW_PRQ

At query-time, HNSW_PRQ mainly exposes two tuning knobs:

  • ef controls how wide the graph traversal is.
  • refine_k controls how many extra candidates are reranked (only matters if refinement is enabled).
results = vbase.search( collection_name="my_collection", anns_field="embedding", data=[query_embedding], limit=10, search_params={ "params": { "ef": 64, "refine_k": 2, } }, )

Search-time parameters

ParameterWhat it controlsValue rangePractical guidance
efHow many nodes are explored during search1..int_maxLarger = higher recall, slower queries. A common starting point is ef ≈ K and tuning upward (often up to 10×K).
refine_kReranking “magnification” factor1..float_maxIf K=100 and refine_k=2, rerank ~200 candidates then return the best 100. Higher improves recall but costs more compute.

Quick tuning recipe

  1. Start with M=30, efConstruction=360, nbits=8, nrq=2.
  2. Set m so it divides your dimension D (try m=D/2 first).
  3. For better recall:
    • Increase ef (query-time) first.
    • If still not enough, increase M and/or efConstruction (build-time).
  4. If accuracy is still not enough, enable refinement:
    • refine=True, refine_type="BF16" or "FP16", then tune refine_k.
Last updated on