HNSW_SQ

HNSW_SQ is a vector index that combines HNSW (a graph-based ANN index) with Scalar Quantization (SQ) (vector compression). In practice, it gives you HNSW-like query speed with lower memory usage, at the cost of a tunable accuracy trade-off.

VBase is Milvus-backed, so the behavior and parameters of HNSW_SQ follow Milvus semantics.

When should I use it?

Pick HNSW_SQ when:

You want fast nearest-neighbor search (HNSW-style latency).
Your dataset is getting large and memory/cost matters.
You can accept a small accuracy drop (or you’ll use refinement to recover accuracy).

Avoid it when you need exact results (consider FLAT) or when you have plenty of memory and want the simplest tuning (plain HNSW).

How it works

HNSW builds a multi-layer graph where each vector is a node. During search, the algorithm starts from higher layers to quickly get “close”, then descends to the bottom layer to find nearest neighbors.

2) SQ: smaller, faster vectors

Scalar Quantization compresses each vector dimension from full precision (e.g. FP32) into fewer bits:

SQ8: 8 bits per dimension (256 levels)
SQ6: 6 bits per dimension (64 levels)
SQ4U (Milvus 2.6.8+): 4-bit uniform quantization using global parameters (very fast + very small)

Compression reduces memory footprint and often improves CPU cache behavior and throughput.

3) HNSW + SQ together

With HNSW_SQ:

Vectors are compressed using sq_type.
The HNSW graph is built over the compressed representation.
Searches traverse the graph using compressed distances.
Optionally, results are refined (re-ranked) using a higher-precision representation.

Refinement is the key “best of both worlds”: you keep compression for speed/memory, and you regain accuracy on the top candidates.

Build an HNSW_SQ index in VBase

Below is a typical configuration. Tune M and efConstruction for the HNSW graph, and pick sq_type for compression.


from dodil import Client
from dodil.vbase import VBaseConfig
 
# Authorize
c = Client(
    service_account_id="...",
    service_account_secret="...",
)
 
# Connect
vbase = c.vbase.connect(
    VBaseConfig(
        host="vbase-db-<id>.infra.dodil.cloud",
        port=443,
        scheme="https",
        db_name="db_<id>",
    )
)
 
# Example: create an HNSW_SQ index on the `embedding` field
# NOTE: method names may differ slightly depending on your SDK version.
vbase.create_index(
    collection_name="your_collection",
    field_name="embedding",
    index_type="HNSW_SQ",
    metric_type="COSINE",
    params={
        "M": 64,
        "efConstruction": 100,
        "sq_type": "SQ6",
        "refine": True,
        "refine_type": "SQ8",
    },
)

What these parameters mean

Parameter	What it controls	Practical guidance
`M`	Max neighbors per node in the HNSW graph	Higher = better recall, more memory. Common range: 5–100.
`efConstruction`	How many candidates are considered when building the graph	Higher = better index quality, slower build. Common range: 50–500.
`sq_type`	Quantization/compression format	`SQ8` is the default-style balance; `SQ6` is smaller/faster; `SQ4U` is the most aggressive (best on normalized / stable distributions).
`refine`	Whether to re-rank top candidates with higher precision	Enable when you need better accuracy with compressed vectors.
`refine_type`	Precision used during refinement	Must be higher precision than `sq_type` (e.g., refine SQ6 with SQ8, BF16/FP16, or FP32).

Search with HNSW_SQ

At query time, you mainly tune ef (how wide the graph search is) and optionally refine_k (how many extra candidates you refine).


results = vbase.search(
    collection_name="your_collection",
    anns_field="embedding",
    data=[[0.1, 0.2, 0.3, 0.4]],
    limit=10,
    search_params={
        "ef": 64,
        "refine_k": 2,
    },
)

Search parameters

Parameter	What it controls	Practical guidance
`ef`	Search breadth on the bottom layer of HNSW	Larger = better recall, slower queries. A good starting point is `ef = limit`, then increase gradually.
`refine_k`	How many candidates to refine vs `limit`	`1` means only the top K are refined; `2` means refine top 2×K then return best K.

Notes and gotchas

SQ4U availability depends on the underlying cluster version (Milvus 2.6.8+).
If you enable refine, make sure refine_type is truly higher precision than sq_type.
If your data is not normalized and has very uneven value ranges per dimension, aggressive quantization can hurt recall. In that case, prefer SQ8 (or enable refinement).

Recommended starting configs

Balanced (good default): M=32, efConstruction=200, sq_type=SQ8, refine=false
Memory-lean + accuracy recovery: M=32, efConstruction=200, sq_type=SQ6, refine=true, refine_type=SQ8, refine_k=2
Maximum speed / smallest memory (if supported): sq_type=SQ4U, and keep refine=true if you care about recall