HNSW_PQ
HNSW_PQ is an approximate nearest neighbor (ANN) index designed for fast vector search with lower memory usage. It combines:
- HNSW (Hierarchical Navigable Small World): a multi-layer graph that makes nearest-neighbor navigation fast.
- PQ (Product Quantization): compresses vectors into compact codes to reduce RAM.
Compared to HNSW_SQ, HNSW_PQ typically achieves higher recall at the same compression level, but it may have slower queries and longer index build time.
How it works
1) Compress vectors with PQ
PQ splits each vector into m sub-vectors, then quantizes each sub-vector using a codebook. Two key knobs:
m: number of sub-vectors (must evenly divide your vector dimension)nbits: bits per sub-vector code (higher = better quality, more memory)
2) Build an HNSW graph on the compressed representation
Milvus/Dodil builds the HNSW graph using the compressed codes. This usually makes the graph and neighbor evaluation lighter on memory.
3) (Optional) Refine results for higher accuracy
During search, the engine can optionally re-rank top candidates using a higher-precision representation:
refine: enables the refine steprefine_type: precision used for re-ranking (for exampleSQ8,BF16,FP16,FP32)refine_k: how many more candidates to re-check before returning top-k (a multiplier)
Example: if you request top_k=100 and set refine_k=2, the engine can re-rank up to 200 candidates and return the best 100.
When should you use HNSW_PQ?
Use HNSW_PQ when you need:
- High recall with limited memory
- Large collections where brute force / uncompressed HNSW becomes expensive
- A tunable trade-off between latency, build time, memory, and accuracy
If you want maximum accuracy and memory is not a concern, start with HNSW (no compression). If you mainly want simpler compression and faster queries, consider HNSW_SQ.
Create an HNSW_PQ index
Below is a typical example using the Dodil SDK. (Assumes you already have a connected vbase client and a collection created.)
# Assume `vbase` is already connected
col = vbase.collection("my_collection")
col.create_index(
field_name="embedding",
index_type="HNSW_PQ",
metric_type="COSINE", # or L2 / IP depending on your use-case
params={
# HNSW knobs
"M": 30,
"efConstruction": 360,
# PQ knobs
"m": 16,
"nbits": 8,
# Optional refine step
"refine": True,
"refine_type": "SQ8",
},
)Notes:
- Pick
mso that it evenly divides your vector dimension (for example, dim=1536 can work with m=16, 32, 48, 64, 96, 128…) - Larger
MandefConstructionusually improve recall, but increase build time and memory
Search with HNSW_PQ
query_vec = [0.1, 0.2, 0.3, 0.4] # example only
results = col.search(
data=[query_vec],
limit=10,
search_params={
"ef": 64,
"refine_k": 2, # re-check 2x candidates during refinement
},
)
for hit in results[0]:
print(hit.id, hit.score)Parameters
Index build parameters
| Parameter | What it controls | Typical range / notes |
|---|---|---|
M | Max neighbors per node in the HNSW graph | Integer ≥ 2. Higher improves recall, increases memory/build time. |
efConstruction | Candidate pool size while building the graph | Larger improves recall but increases build time. |
m | Number of PQ sub-vectors | Must evenly divide vector dimension. Larger can improve quality but increases compute/memory. |
nbits | Bits per PQ sub-vector | Common values: 8. Higher can improve accuracy with more memory. |
refine | Enable refinement (re-ranking) | True / False. If True, you should set refine_type. |
refine_type | Precision used for refinement | Example values: SQ6, SQ8, BF16, FP16, FP32. Higher precision improves accuracy but costs more memory/compute. |
Search parameters
| Parameter | What it controls | Tuning guidance |
|---|---|---|
ef | How wide the search explores the HNSW graph | Higher usually improves recall but increases latency. Many setups start in the 32–256 range. |
refine_k | How many extra candidates to re-check (multiplier) | 1 means no extra candidates; 2–4 is common when you want better accuracy. |
Practical tuning tips
- Start with:
M=16–48,efConstruction=200–500,nbits=8,ef=64–128. - If recall is too low:
- Increase
effirst (cheapest knob at query time) - Then increase
M/efConstruction - Consider enabling refinement and raising
refine_k
- Increase
- If memory is too high:
- Lower
M - Ensure PQ settings are reasonable (
m,nbits)
- Lower
- If indexing is too slow:
- Reduce
efConstruction - Reduce
M
- Reduce
That’s it — HNSW_PQ is a strong default when you want a memory-efficient index while keeping high recall.