Indexes
An index is an additional data structure built on top of your stored data. In Dodil VBase (Milvus-backed), indexes are how you make searches and filters fast at scale.
Indexes are not free:
- Build time: creating an index requires preprocessing.
- Extra storage + RAM: indexes take disk space and usually require memory during search.
- Accuracy trade-offs: many vector indexes use approximate search to gain speed, which can slightly reduce recall (how often the true nearest neighbors are returned).
The goal is simple: maximize query speed and throughput while keeping recall high and resource usage predictable.
What gets indexed?
Indexes are created per field.
- Vector fields (dense, binary, sparse) use specialized vector indexes.
- Scalar fields (numbers, booleans, strings, JSON, arrays) use scalar indexes to accelerate filtering.
This page focuses on vector indexes, because that’s where the biggest performance and cost differences are.
Why you should care
Without a vector index, search may behave like a “scan a lot of vectors and compute distances” operation. That can be acceptable for small datasets, but quickly becomes expensive as you grow.
A good index lets you:
- Keep latency low as collections grow.
- Increase QPS (queries per second).
- Control memory usage and operational cost.
The core idea behind most vector indexes
Most approximate nearest neighbor (ANN) indexes follow a similar strategy:
- Narrow down candidates quickly using a coarse structure.
- Score candidates efficiently (sometimes using compressed/quantized vectors).
- Refine the best candidates with higher precision to recover recall.
This is why you will see indexes described as a combination of three building blocks.
Vector index anatomy
Think of a vector index as three layers:
1) Data structure
This is the “candidate generator” — it finds a smaller set of vectors worth checking.
Common families:
-
IVF (Inverted File)
- Clusters vectors into buckets (partitions) based on centroids.
- At query time, it searches only the closest buckets instead of the whole dataset.
- Often a strong choice for large datasets and high throughput.
-
Graph-based (e.g., HNSW)
- Builds a navigable graph where each vector links to nearby neighbors.
- Queries walk the graph to quickly find close matches.
- Often a great choice for low-latency search in high-dimensional spaces.
2) Quantization (optional)
Quantization compresses vectors to reduce memory and speed up distance computation.
-
SQ (Scalar Quantization)
- Stores each dimension with fewer bits (e.g., 8-bit).
- Big memory savings with relatively small quality loss.
-
PQ (Product Quantization)
- Splits vectors into chunks and encodes them using codebooks.
- Higher compression (and often lower memory) than SQ, but can reduce recall more.
Quantization is usually the lever you pull when memory is your bottleneck.
3) Refiner (optional)
Because quantization is lossy, many systems compensate by:
- Retrieving more than topK candidates (an expansion step)
- Recomputing distances for those candidates using higher precision
That last step is the refiner. It helps restore recall without paying the cost of full-precision scoring across the entire dataset.
Performance trade-offs
When choosing an index, you’re balancing three things:
- Build time: how long it takes to create the index.
- Query performance: latency / QPS.
- Recall: how close results are to the true nearest neighbors.
Some practical rules of thumb:
- Graph-based indexes often deliver excellent latency/QPS for typical
topKvalues. - IVF-based indexes tend to shine when you need very large
topK(for example, thousands of results). - PQ usually gives better recall than SQ at similar compression rates, while SQ is often faster.
How to pick an index (practical guide)
Use this decision flow as a starting point:
-
Small dataset / highest accuracy needed
- Start with a flat (exact) index if you can afford it.
-
Low latency for interactive search
- Prefer a graph-based index (e.g., HNSW family).
-
High throughput on very large datasets
- Prefer an IVF family index.
-
Memory is tight
- Add quantization (SQ/PQ) and enable refinement, then tune recall with search parameters.
-
You’re not sure
- Start with the default recommended index in Dodil docs for your vector type, measure latency + recall, then tune.
When indexes matter most
Indexes deliver the biggest benefit when:
- Your collection is large (millions+ vectors).
- You run frequent searches with strict latency goals.
- You combine vector search with filters (where scalar indexes can also help).
If you’re still prototyping on small data, you can start simple and add indexes as you scale.
Next steps
- Learn about available vector index types (dense, binary, sparse).
- Check the “Search” docs to understand parameters like
topK, candidate expansion, and how they influence recall/latency. - Use the SDK reference to see how to create or modify indexes for your vector fields.