Skip to Content
We are live but in Staging 🎉

IVF_SQ8

IVF_SQ8 is a quantization-based vector index designed for large-scale similarity search. It improves query speed and reduces memory usage by combining:

  • IVF (Inverted File): groups vectors into clusters so searches scan only the most relevant clusters.
  • SQ8 (Scalar Quantization, 8-bit): compresses vectors from float32 to 8-bit integers, greatly lowering memory footprint and accelerating distance computations.

This is a great default when you want good performance and low memory without the heavier complexity of product quantization.


How it works (simple mental model)

IVF: search a few “buckets”, not the whole dataset

Think of IVF like splitting your collection into nlist buckets using k-means. Every vector is assigned to the nearest bucket (cluster centroid).

When you query, the engine:

  1. Compares the query vector to the cluster centroids.
  2. Picks the closest nprobe clusters.
  3. Searches only vectors inside those clusters.

Result: faster search because you avoid scanning everything.

SQ8: store vectors smaller, compute faster

SQ8 compresses each vector dimension into an 8-bit value. It keeps enough precision for fast similarity search while making vectors far cheaper to store and scan.

Result: lower RAM usage and faster distance math.

IVF + SQ8 together

  • IVF reduces how many vectors you consider.
  • SQ8 reduces how expensive each comparison is.

When should I use IVF_SQ8?

Use IVF_SQ8 when:

  • Your dataset is large (hundreds of thousands to billions of vectors).
  • You want lower memory usage than exhaustive search (FLAT).
  • You can tolerate a small accuracy tradeoff for much faster queries.

Avoid IVF_SQ8 when:

  • You need maximum recall and your dataset is small enough for FLAT.
  • Your vectors are extremely sensitive to quantization error (rare; usually manageable).

Key parameters

Build-time: nlist

  • What it is: Number of IVF clusters.
  • Effect:
    • Larger nlist → more (smaller) clusters → potentially higher recall.
    • But larger nlist also increases index build time and memory overhead for cluster metadata.

Rule of thumb: start with nlist in the range 32–4096, depending on dataset size.

Query-time: nprobe

  • What it is: How many IVF clusters are searched for each query.
  • Effect:
    • Larger nprobe → higher recall.
    • But larger nprobe increases latency (more candidates scanned).

Rule of thumb: set nprobe proportionally to nlist (and tune based on recall vs latency).


Build an IVF_SQ8 index

Assuming you already have a connected vbase client:

from dodil import Client from dodil.vbase import VBaseConfig # Authorize c = Client( service_account_id="...", service_account_secret="...", ) # Connect vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) )

Create an index on your vector field (example field name: embedding):

# Example: create IVF_SQ8 index on a vector field # NOTE: method names may vary depending on your wrapper version; the intent is: # - choose index_type IVF_SQ8 # - set metric_type # - set nlist vbase.create_index( collection_name="my_collection", field_name="embedding", index_type="IVF_SQ8", metric_type="COSINE", # or L2 / IP params={ "nlist": 128, }, index_name="embedding_ivf_sq8", )

Don’t forget to load

Most production setups require loading a collection (and its index) before search. If the collection isn’t loaded, searches may fail or be much slower.

See the Load & Release guide for how to load a collection properly.


Search using IVF_SQ8

At query time you tune nprobe:

res = vbase.search( collection_name="my_collection", vector_field="embedding", data=[[0.1, 0.2, 0.3, 0.4]], limit=10, search_params={ "params": { "nprobe": 8, } }, ) print(res)

Practical tuning workflow

  1. Pick a metric

    • COSINE for normalized embeddings (common for text/image embeddings).
    • L2 for Euclidean distance.
    • IP for inner product.
  2. Start with conservative defaults

    • nlist = 128
    • nprobe = 8
  3. Measure recall & latency

    • Increase nprobe first to improve recall.
    • Increase nlist if you need better recall with tighter candidate sets.
  4. Watch memory & build time

    • Larger nlist costs more at build time.
    • SQ8 keeps memory down compared to non-quantized IVF variants.

FAQ

Is IVF_SQ8 “lossy”?

Yes. SQ8 compresses float32 values into 8-bit integers, so there is quantization error. In practice, you usually gain big performance benefits with a small recall drop, especially when you tune nprobe.

What’s the difference vs IVF_FLAT?

  • IVF_FLAT stores full-precision vectors and is typically more accurate.
  • IVF_SQ8 compresses vectors to save memory and speed up search.

What if I want even smaller memory?

You may consider more aggressive quantization approaches (for example PQ-style indexes), but they usually require more careful tuning and can trade off more accuracy.


Next

  • Learn how to load and release collections to control memory and search readiness.
  • Explore other index types like IVF_FLAT or graph-based indexes for different accuracy/latency tradeoffs.
Last updated on