IVF_SQ8
IVF_SQ8 is a quantization-based vector index designed for large-scale similarity search. It improves query speed and reduces memory usage by combining:
- IVF (Inverted File): groups vectors into clusters so searches scan only the most relevant clusters.
- SQ8 (Scalar Quantization, 8-bit): compresses vectors from
float32to 8-bit integers, greatly lowering memory footprint and accelerating distance computations.
This is a great default when you want good performance and low memory without the heavier complexity of product quantization.
How it works (simple mental model)
IVF: search a few “buckets”, not the whole dataset
Think of IVF like splitting your collection into nlist buckets using k-means. Every vector is assigned to the nearest bucket (cluster centroid).
When you query, the engine:
- Compares the query vector to the cluster centroids.
- Picks the closest
nprobeclusters. - Searches only vectors inside those clusters.
Result: faster search because you avoid scanning everything.
SQ8: store vectors smaller, compute faster
SQ8 compresses each vector dimension into an 8-bit value. It keeps enough precision for fast similarity search while making vectors far cheaper to store and scan.
Result: lower RAM usage and faster distance math.
IVF + SQ8 together
- IVF reduces how many vectors you consider.
- SQ8 reduces how expensive each comparison is.
When should I use IVF_SQ8?
Use IVF_SQ8 when:
- Your dataset is large (hundreds of thousands to billions of vectors).
- You want lower memory usage than exhaustive search (
FLAT). - You can tolerate a small accuracy tradeoff for much faster queries.
Avoid IVF_SQ8 when:
- You need maximum recall and your dataset is small enough for
FLAT. - Your vectors are extremely sensitive to quantization error (rare; usually manageable).
Key parameters
Build-time: nlist
- What it is: Number of IVF clusters.
- Effect:
- Larger
nlist→ more (smaller) clusters → potentially higher recall. - But larger
nlistalso increases index build time and memory overhead for cluster metadata.
- Larger
Rule of thumb: start with nlist in the range 32–4096, depending on dataset size.
Query-time: nprobe
- What it is: How many IVF clusters are searched for each query.
- Effect:
- Larger
nprobe→ higher recall. - But larger
nprobeincreases latency (more candidates scanned).
- Larger
Rule of thumb: set nprobe proportionally to nlist (and tune based on recall vs latency).
Build an IVF_SQ8 index
Assuming you already have a connected vbase client:
from dodil import Client
from dodil.vbase import VBaseConfig
# Authorize
c = Client(
service_account_id="...",
service_account_secret="...",
)
# Connect
vbase = c.vbase.connect(
VBaseConfig(
host="vbase-db-<id>.infra.dodil.cloud",
port=443,
scheme="https",
db_name="db_<id>",
)
)Create an index on your vector field (example field name: embedding):
# Example: create IVF_SQ8 index on a vector field
# NOTE: method names may vary depending on your wrapper version; the intent is:
# - choose index_type IVF_SQ8
# - set metric_type
# - set nlist
vbase.create_index(
collection_name="my_collection",
field_name="embedding",
index_type="IVF_SQ8",
metric_type="COSINE", # or L2 / IP
params={
"nlist": 128,
},
index_name="embedding_ivf_sq8",
)Don’t forget to load
Most production setups require loading a collection (and its index) before search. If the collection isn’t loaded, searches may fail or be much slower.
See the Load & Release guide for how to load a collection properly.
Search using IVF_SQ8
At query time you tune nprobe:
res = vbase.search(
collection_name="my_collection",
vector_field="embedding",
data=[[0.1, 0.2, 0.3, 0.4]],
limit=10,
search_params={
"params": {
"nprobe": 8,
}
},
)
print(res)Practical tuning workflow
-
Pick a metric
COSINEfor normalized embeddings (common for text/image embeddings).L2for Euclidean distance.IPfor inner product.
-
Start with conservative defaults
nlist = 128nprobe = 8
-
Measure recall & latency
- Increase
nprobefirst to improve recall. - Increase
nlistif you need better recall with tighter candidate sets.
- Increase
-
Watch memory & build time
- Larger
nlistcosts more at build time. - SQ8 keeps memory down compared to non-quantized IVF variants.
- Larger
FAQ
Is IVF_SQ8 “lossy”?
Yes. SQ8 compresses float32 values into 8-bit integers, so there is quantization error. In practice, you usually gain big performance benefits with a small recall drop, especially when you tune nprobe.
What’s the difference vs IVF_FLAT?
IVF_FLATstores full-precision vectors and is typically more accurate.IVF_SQ8compresses vectors to save memory and speed up search.
What if I want even smaller memory?
You may consider more aggressive quantization approaches (for example PQ-style indexes), but they usually require more careful tuning and can trade off more accuracy.
Next
- Learn how to load and release collections to control memory and search readiness.
- Explore other index types like IVF_FLAT or graph-based indexes for different accuracy/latency tradeoffs.