Skip to Content
We are live but in Staging 🎉

HNSW_SQ

HNSW_SQ is a vector index that combines HNSW (a graph-based ANN index) with Scalar Quantization (SQ) (vector compression). In practice, it gives you HNSW-like query speed with lower memory usage, at the cost of a tunable accuracy trade-off.

VBase is Milvus-backed, so the behavior and parameters of HNSW_SQ follow Milvus semantics.

When should I use it?

Pick HNSW_SQ when:

  • You want fast nearest-neighbor search (HNSW-style latency).
  • Your dataset is getting large and memory/cost matters.
  • You can accept a small accuracy drop (or you’ll use refinement to recover accuracy).

Avoid it when you need exact results (consider FLAT) or when you have plenty of memory and want the simplest tuning (plain HNSW).

How it works

1) HNSW: fast navigation

HNSW builds a multi-layer graph where each vector is a node. During search, the algorithm starts from higher layers to quickly get “close”, then descends to the bottom layer to find nearest neighbors.

2) SQ: smaller, faster vectors

Scalar Quantization compresses each vector dimension from full precision (e.g. FP32) into fewer bits:

  • SQ8: 8 bits per dimension (256 levels)
  • SQ6: 6 bits per dimension (64 levels)
  • SQ4U (Milvus 2.6.8+): 4-bit uniform quantization using global parameters (very fast + very small)

Compression reduces memory footprint and often improves CPU cache behavior and throughput.

3) HNSW + SQ together

With HNSW_SQ:

  1. Vectors are compressed using sq_type.
  2. The HNSW graph is built over the compressed representation.
  3. Searches traverse the graph using compressed distances.
  4. Optionally, results are refined (re-ranked) using a higher-precision representation.

Refinement is the key “best of both worlds”: you keep compression for speed/memory, and you regain accuracy on the top candidates.

Build an HNSW_SQ index in VBase

Below is a typical configuration. Tune M and efConstruction for the HNSW graph, and pick sq_type for compression.

from dodil import Client from dodil.vbase import VBaseConfig # Authorize c = Client( service_account_id="...", service_account_secret="...", ) # Connect vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) ) # Example: create an HNSW_SQ index on the `embedding` field # NOTE: method names may differ slightly depending on your SDK version. vbase.create_index( collection_name="your_collection", field_name="embedding", index_type="HNSW_SQ", metric_type="COSINE", params={ "M": 64, "efConstruction": 100, "sq_type": "SQ6", "refine": True, "refine_type": "SQ8", }, )

What these parameters mean

ParameterWhat it controlsPractical guidance
MMax neighbors per node in the HNSW graphHigher = better recall, more memory. Common range: 5–100.
efConstructionHow many candidates are considered when building the graphHigher = better index quality, slower build. Common range: 50–500.
sq_typeQuantization/compression formatSQ8 is the default-style balance; SQ6 is smaller/faster; SQ4U is the most aggressive (best on normalized / stable distributions).
refineWhether to re-rank top candidates with higher precisionEnable when you need better accuracy with compressed vectors.
refine_typePrecision used during refinementMust be higher precision than sq_type (e.g., refine SQ6 with SQ8, BF16/FP16, or FP32).

Search with HNSW_SQ

At query time, you mainly tune ef (how wide the graph search is) and optionally refine_k (how many extra candidates you refine).

results = vbase.search( collection_name="your_collection", anns_field="embedding", data=[[0.1, 0.2, 0.3, 0.4]], limit=10, search_params={ "ef": 64, "refine_k": 2, }, )

Search parameters

ParameterWhat it controlsPractical guidance
efSearch breadth on the bottom layer of HNSWLarger = better recall, slower queries. A good starting point is ef = limit, then increase gradually.
refine_kHow many candidates to refine vs limit1 means only the top K are refined; 2 means refine top 2Ă—K then return best K.

Notes and gotchas

  • SQ4U availability depends on the underlying cluster version (Milvus 2.6.8+).
  • If you enable refine, make sure refine_type is truly higher precision than sq_type.
  • If your data is not normalized and has very uneven value ranges per dimension, aggressive quantization can hurt recall. In that case, prefer SQ8 (or enable refinement).
  • Balanced (good default): M=32, efConstruction=200, sq_type=SQ8, refine=false
  • Memory-lean + accuracy recovery: M=32, efConstruction=200, sq_type=SQ6, refine=true, refine_type=SQ8, refine_k=2
  • Maximum speed / smallest memory (if supported): sq_type=SQ4U, and keep refine=true if you care about recall
Last updated on