HNSW_SQ
HNSW_SQ is a vector index that combines HNSW (a graph-based ANN index) with Scalar Quantization (SQ) (vector compression). In practice, it gives you HNSW-like query speed with lower memory usage, at the cost of a tunable accuracy trade-off.
VBase is Milvus-backed, so the behavior and parameters of
HNSW_SQfollow Milvus semantics.
When should I use it?
Pick HNSW_SQ when:
- You want fast nearest-neighbor search (HNSW-style latency).
- Your dataset is getting large and memory/cost matters.
- You can accept a small accuracy drop (or you’ll use refinement to recover accuracy).
Avoid it when you need exact results (consider FLAT) or when you have plenty of memory and want the simplest tuning (plain HNSW).
How it works
1) HNSW: fast navigation
HNSW builds a multi-layer graph where each vector is a node. During search, the algorithm starts from higher layers to quickly get “close”, then descends to the bottom layer to find nearest neighbors.
2) SQ: smaller, faster vectors
Scalar Quantization compresses each vector dimension from full precision (e.g. FP32) into fewer bits:
- SQ8: 8 bits per dimension (256 levels)
- SQ6: 6 bits per dimension (64 levels)
- SQ4U (Milvus 2.6.8+): 4-bit uniform quantization using global parameters (very fast + very small)
Compression reduces memory footprint and often improves CPU cache behavior and throughput.
3) HNSW + SQ together
With HNSW_SQ:
- Vectors are compressed using
sq_type. - The HNSW graph is built over the compressed representation.
- Searches traverse the graph using compressed distances.
- Optionally, results are refined (re-ranked) using a higher-precision representation.
Refinement is the key “best of both worlds”: you keep compression for speed/memory, and you regain accuracy on the top candidates.
Build an HNSW_SQ index in VBase
Below is a typical configuration. Tune M and efConstruction for the HNSW graph, and pick sq_type for compression.
from dodil import Client
from dodil.vbase import VBaseConfig
# Authorize
c = Client(
service_account_id="...",
service_account_secret="...",
)
# Connect
vbase = c.vbase.connect(
VBaseConfig(
host="vbase-db-<id>.infra.dodil.cloud",
port=443,
scheme="https",
db_name="db_<id>",
)
)
# Example: create an HNSW_SQ index on the `embedding` field
# NOTE: method names may differ slightly depending on your SDK version.
vbase.create_index(
collection_name="your_collection",
field_name="embedding",
index_type="HNSW_SQ",
metric_type="COSINE",
params={
"M": 64,
"efConstruction": 100,
"sq_type": "SQ6",
"refine": True,
"refine_type": "SQ8",
},
)What these parameters mean
| Parameter | What it controls | Practical guidance |
|---|---|---|
M | Max neighbors per node in the HNSW graph | Higher = better recall, more memory. Common range: 5–100. |
efConstruction | How many candidates are considered when building the graph | Higher = better index quality, slower build. Common range: 50–500. |
sq_type | Quantization/compression format | SQ8 is the default-style balance; SQ6 is smaller/faster; SQ4U is the most aggressive (best on normalized / stable distributions). |
refine | Whether to re-rank top candidates with higher precision | Enable when you need better accuracy with compressed vectors. |
refine_type | Precision used during refinement | Must be higher precision than sq_type (e.g., refine SQ6 with SQ8, BF16/FP16, or FP32). |
Search with HNSW_SQ
At query time, you mainly tune ef (how wide the graph search is) and optionally refine_k (how many extra candidates you refine).
results = vbase.search(
collection_name="your_collection",
anns_field="embedding",
data=[[0.1, 0.2, 0.3, 0.4]],
limit=10,
search_params={
"ef": 64,
"refine_k": 2,
},
)Search parameters
| Parameter | What it controls | Practical guidance |
|---|---|---|
ef | Search breadth on the bottom layer of HNSW | Larger = better recall, slower queries. A good starting point is ef = limit, then increase gradually. |
refine_k | How many candidates to refine vs limit | 1 means only the top K are refined; 2 means refine top 2Ă—K then return best K. |
Notes and gotchas
- SQ4U availability depends on the underlying cluster version (Milvus 2.6.8+).
- If you enable
refine, make surerefine_typeis truly higher precision thansq_type. - If your data is not normalized and has very uneven value ranges per dimension, aggressive quantization can hurt recall. In that case, prefer
SQ8(or enable refinement).
Recommended starting configs
- Balanced (good default):
M=32,efConstruction=200,sq_type=SQ8,refine=false - Memory-lean + accuracy recovery:
M=32,efConstruction=200,sq_type=SQ6,refine=true,refine_type=SQ8,refine_k=2 - Maximum speed / smallest memory (if supported):
sq_type=SQ4U, and keeprefine=trueif you care about recall