Skip to Content
We are live but in Staging 🎉

HNSW_PQ

HNSW_PQ is an approximate nearest neighbor (ANN) index designed for fast vector search with lower memory usage. It combines:

  • HNSW (Hierarchical Navigable Small World): a multi-layer graph that makes nearest-neighbor navigation fast.
  • PQ (Product Quantization): compresses vectors into compact codes to reduce RAM.

Compared to HNSW_SQ, HNSW_PQ typically achieves higher recall at the same compression level, but it may have slower queries and longer index build time.

How it works

1) Compress vectors with PQ

PQ splits each vector into m sub-vectors, then quantizes each sub-vector using a codebook. Two key knobs:

  • m: number of sub-vectors (must evenly divide your vector dimension)
  • nbits: bits per sub-vector code (higher = better quality, more memory)

2) Build an HNSW graph on the compressed representation

Milvus/Dodil builds the HNSW graph using the compressed codes. This usually makes the graph and neighbor evaluation lighter on memory.

3) (Optional) Refine results for higher accuracy

During search, the engine can optionally re-rank top candidates using a higher-precision representation:

  • refine: enables the refine step
  • refine_type: precision used for re-ranking (for example SQ8, BF16, FP16, FP32)
  • refine_k: how many more candidates to re-check before returning top-k (a multiplier)

Example: if you request top_k=100 and set refine_k=2, the engine can re-rank up to 200 candidates and return the best 100.

When should you use HNSW_PQ?

Use HNSW_PQ when you need:

  • High recall with limited memory
  • Large collections where brute force / uncompressed HNSW becomes expensive
  • A tunable trade-off between latency, build time, memory, and accuracy

If you want maximum accuracy and memory is not a concern, start with HNSW (no compression). If you mainly want simpler compression and faster queries, consider HNSW_SQ.

Create an HNSW_PQ index

Below is a typical example using the Dodil SDK. (Assumes you already have a connected vbase client and a collection created.)

# Assume `vbase` is already connected col = vbase.collection("my_collection") col.create_index( field_name="embedding", index_type="HNSW_PQ", metric_type="COSINE", # or L2 / IP depending on your use-case params={ # HNSW knobs "M": 30, "efConstruction": 360, # PQ knobs "m": 16, "nbits": 8, # Optional refine step "refine": True, "refine_type": "SQ8", }, )

Notes:

  • Pick m so that it evenly divides your vector dimension (for example, dim=1536 can work with m=16, 32, 48, 64, 96, 128…)
  • Larger M and efConstruction usually improve recall, but increase build time and memory

Search with HNSW_PQ

query_vec = [0.1, 0.2, 0.3, 0.4] # example only results = col.search( data=[query_vec], limit=10, search_params={ "ef": 64, "refine_k": 2, # re-check 2x candidates during refinement }, ) for hit in results[0]: print(hit.id, hit.score)

Parameters

Index build parameters

ParameterWhat it controlsTypical range / notes
MMax neighbors per node in the HNSW graphInteger ≥ 2. Higher improves recall, increases memory/build time.
efConstructionCandidate pool size while building the graphLarger improves recall but increases build time.
mNumber of PQ sub-vectorsMust evenly divide vector dimension. Larger can improve quality but increases compute/memory.
nbitsBits per PQ sub-vectorCommon values: 8. Higher can improve accuracy with more memory.
refineEnable refinement (re-ranking)True / False. If True, you should set refine_type.
refine_typePrecision used for refinementExample values: SQ6, SQ8, BF16, FP16, FP32. Higher precision improves accuracy but costs more memory/compute.

Search parameters

ParameterWhat it controlsTuning guidance
efHow wide the search explores the HNSW graphHigher usually improves recall but increases latency. Many setups start in the 32–256 range.
refine_kHow many extra candidates to re-check (multiplier)1 means no extra candidates; 2–4 is common when you want better accuracy.

Practical tuning tips

  • Start with: M=16–48, efConstruction=200–500, nbits=8, ef=64–128.
  • If recall is too low:
    • Increase ef first (cheapest knob at query time)
    • Then increase M / efConstruction
    • Consider enabling refinement and raising refine_k
  • If memory is too high:
    • Lower M
    • Ensure PQ settings are reasonable (m, nbits)
  • If indexing is too slow:
    • Reduce efConstruction
    • Reduce M

That’s it — HNSW_PQ is a strong default when you want a memory-efficient index while keeping high recall.

Last updated on