HNSW_PQ

HNSW_PQ is an approximate nearest neighbor (ANN) index designed for fast vector search with lower memory usage. It combines:

HNSW (Hierarchical Navigable Small World): a multi-layer graph that makes nearest-neighbor navigation fast.
PQ (Product Quantization): compresses vectors into compact codes to reduce RAM.

Compared to HNSW_SQ, HNSW_PQ typically achieves higher recall at the same compression level, but it may have slower queries and longer index build time.

How it works

1) Compress vectors with PQ

PQ splits each vector into m sub-vectors, then quantizes each sub-vector using a codebook. Two key knobs:

m: number of sub-vectors (must evenly divide your vector dimension)
nbits: bits per sub-vector code (higher = better quality, more memory)

2) Build an HNSW graph on the compressed representation

Milvus/Dodil builds the HNSW graph using the compressed codes. This usually makes the graph and neighbor evaluation lighter on memory.

3) (Optional) Refine results for higher accuracy

During search, the engine can optionally re-rank top candidates using a higher-precision representation:

refine: enables the refine step
refine_type: precision used for re-ranking (for example SQ8, BF16, FP16, FP32)
refine_k: how many more candidates to re-check before returning top-k (a multiplier)

Example: if you request top_k=100 and set refine_k=2, the engine can re-rank up to 200 candidates and return the best 100.

When should you use HNSW_PQ?

Use HNSW_PQ when you need:

High recall with limited memory
Large collections where brute force / uncompressed HNSW becomes expensive
A tunable trade-off between latency, build time, memory, and accuracy

If you want maximum accuracy and memory is not a concern, start with HNSW (no compression). If you mainly want simpler compression and faster queries, consider HNSW_SQ.

Create an HNSW_PQ index

Below is a typical example using the Dodil SDK. (Assumes you already have a connected vbase client and a collection created.)


# Assume `vbase` is already connected
col = vbase.collection("my_collection")
 
col.create_index(
    field_name="embedding",
    index_type="HNSW_PQ",
    metric_type="COSINE",  # or L2 / IP depending on your use-case
    params={
        # HNSW knobs
        "M": 30,
        "efConstruction": 360,
 
        # PQ knobs
        "m": 16,
        "nbits": 8,
 
        # Optional refine step
        "refine": True,
        "refine_type": "SQ8",
    },
)

Notes:

Pick m so that it evenly divides your vector dimension (for example, dim=1536 can work with m=16, 32, 48, 64, 96, 128…)
Larger M and efConstruction usually improve recall, but increase build time and memory

Search with HNSW_PQ


query_vec = [0.1, 0.2, 0.3, 0.4]  # example only
 
results = col.search(
    data=[query_vec],
    limit=10,
    search_params={
        "ef": 64,
        "refine_k": 2,  # re-check 2x candidates during refinement
    },
)
 
for hit in results[0]:
    print(hit.id, hit.score)

Parameters

Index build parameters

Parameter	What it controls	Typical range / notes
`M`	Max neighbors per node in the HNSW graph	Integer ≥ 2. Higher improves recall, increases memory/build time.
`efConstruction`	Candidate pool size while building the graph	Larger improves recall but increases build time.
`m`	Number of PQ sub-vectors	Must evenly divide vector dimension. Larger can improve quality but increases compute/memory.
`nbits`	Bits per PQ sub-vector	Common values: 8. Higher can improve accuracy with more memory.
`refine`	Enable refinement (re-ranking)	`True` / `False`. If `True`, you should set `refine_type`.
`refine_type`	Precision used for refinement	Example values: `SQ6`, `SQ8`, `BF16`, `FP16`, `FP32`. Higher precision improves accuracy but costs more memory/compute.

Search parameters

Parameter	What it controls	Tuning guidance
`ef`	How wide the search explores the HNSW graph	Higher usually improves recall but increases latency. Many setups start in the 32–256 range.
`refine_k`	How many extra candidates to re-check (multiplier)	`1` means no extra candidates; `2–4` is common when you want better accuracy.

Practical tuning tips

Start with: M=16–48, efConstruction=200–500, nbits=8, ef=64–128.
If recall is too low:
- Increase ef first (cheapest knob at query time)
- Then increase M / efConstruction
- Consider enabling refinement and raising refine_k
If memory is too high:
- Lower M
- Ensure PQ settings are reasonable (m, nbits)
If indexing is too slow:
- Reduce efConstruction
- Reduce M

That’s it — HNSW_PQ is a strong default when you want a memory-efficient index while keeping high recall.