IVF_SQ8

IVF_SQ8 is a quantization-based vector index designed for large-scale similarity search. It improves query speed and reduces memory usage by combining:

IVF (Inverted File): groups vectors into clusters so searches scan only the most relevant clusters.
SQ8 (Scalar Quantization, 8-bit): compresses vectors from float32 to 8-bit integers, greatly lowering memory footprint and accelerating distance computations.

This is a great default when you want good performance and low memory without the heavier complexity of product quantization.

How it works (simple mental model)

IVF: search a few “buckets”, not the whole dataset

Think of IVF like splitting your collection into nlist buckets using k-means. Every vector is assigned to the nearest bucket (cluster centroid).

When you query, the engine:

Compares the query vector to the cluster centroids.
Picks the closest nprobe clusters.
Searches only vectors inside those clusters.

Result: faster search because you avoid scanning everything.

SQ8: store vectors smaller, compute faster

SQ8 compresses each vector dimension into an 8-bit value. It keeps enough precision for fast similarity search while making vectors far cheaper to store and scan.

Result: lower RAM usage and faster distance math.

IVF + SQ8 together

IVF reduces how many vectors you consider.
SQ8 reduces how expensive each comparison is.

When should I use IVF_SQ8?

Use IVF_SQ8 when:

Your dataset is large (hundreds of thousands to billions of vectors).
You want lower memory usage than exhaustive search (FLAT).
You can tolerate a small accuracy tradeoff for much faster queries.

Avoid IVF_SQ8 when:

You need maximum recall and your dataset is small enough for FLAT.
Your vectors are extremely sensitive to quantization error (rare; usually manageable).

Key parameters

Build-time: `nlist`

What it is: Number of IVF clusters.
Effect:
- Larger nlist → more (smaller) clusters → potentially higher recall.
- But larger nlist also increases index build time and memory overhead for cluster metadata.

Rule of thumb: start with nlist in the range 32–4096, depending on dataset size.

Query-time: `nprobe`

What it is: How many IVF clusters are searched for each query.
Effect:
- Larger nprobe → higher recall.
- But larger nprobe increases latency (more candidates scanned).

Rule of thumb: set nprobe proportionally to nlist (and tune based on recall vs latency).

Build an IVF_SQ8 index

Assuming you already have a connected vbase client:


from dodil import Client
from dodil.vbase import VBaseConfig
 
# Authorize
c = Client(
    service_account_id="...",
    service_account_secret="...",
)
 
# Connect
vbase = c.vbase.connect(
    VBaseConfig(
        host="vbase-db-<id>.infra.dodil.cloud",
        port=443,
        scheme="https",
        db_name="db_<id>",
    )
)

Create an index on your vector field (example field name: embedding):


# Example: create IVF_SQ8 index on a vector field
# NOTE: method names may vary depending on your wrapper version; the intent is:
#  - choose index_type IVF_SQ8
#  - set metric_type
#  - set nlist
 
vbase.create_index(
    collection_name="my_collection",
    field_name="embedding",
    index_type="IVF_SQ8",
    metric_type="COSINE",  # or L2 / IP
    params={
        "nlist": 128,
    },
    index_name="embedding_ivf_sq8",
)

Don’t forget to load

Most production setups require loading a collection (and its index) before search. If the collection isn’t loaded, searches may fail or be much slower.

See the Load & Release guide for how to load a collection properly.

Search using IVF_SQ8

At query time you tune nprobe:


res = vbase.search(
    collection_name="my_collection",
    vector_field="embedding",
    data=[[0.1, 0.2, 0.3, 0.4]],
    limit=10,
    search_params={
        "params": {
            "nprobe": 8,
        }
    },
)
 
print(res)

Practical tuning workflow

Pick a metric
- COSINE for normalized embeddings (common for text/image embeddings).
- L2 for Euclidean distance.
- IP for inner product.
Start with conservative defaults
- nlist = 128
- nprobe = 8
Measure recall & latency
- Increase nprobe first to improve recall.
- Increase nlist if you need better recall with tighter candidate sets.
Watch memory & build time
- Larger nlist costs more at build time.
- SQ8 keeps memory down compared to non-quantized IVF variants.

FAQ

Is IVF_SQ8 “lossy”?

Yes. SQ8 compresses float32 values into 8-bit integers, so there is quantization error. In practice, you usually gain big performance benefits with a small recall drop, especially when you tune nprobe.

What’s the difference vs IVF_FLAT?

IVF_FLAT stores full-precision vectors and is typically more accurate.
IVF_SQ8 compresses vectors to save memory and speed up search.

What if I want even smaller memory?

You may consider more aggressive quantization approaches (for example PQ-style indexes), but they usually require more careful tuning and can trade off more accuracy.

Learn how to load and release collections to control memory and search readiness.
Explore other index types like IVF_FLAT or graph-based indexes for different accuracy/latency tradeoffs.