BIN_FLAT
BIN_FLAT is the simplest index you can use for binary vector fields. It performs an exhaustive scan: for every query vector, it compares against every stored vector and returns the exact nearest neighbors.
That means:
- âś… Perfect recall (exact results)
- âś… Great as a baseline / benchmark to compare other indexes
- ❌ The slowest option as your dataset grows (work scales linearly with the number of vectors)
If you care about absolute accuracy and your dataset is still relatively small (or you’re validating quality), BIN_FLAT is a solid choice.
When should I use it?
Use BIN_FLAT when you want:
- A simple and reliable starting point for binary search
- A correctness baseline before switching to faster approximate indexes
- Small-to-medium binary vector datasets where latency is still acceptable
If you’re operating at large scale and need low latency, you’ll typically move to an approximate binary index (for example BIN_IVF_FLAT).
Supported distance metrics
For binary vectors, BIN_FLAT commonly supports:
HAMMING(default in many setups)JACCARD
Pick the metric that matches how your binary embeddings are defined.
Create the index in Dodil
Below is a typical flow using the Dodil SDK. (Assumes you already have a connected vbase client.)
# Create a BIN_FLAT index on a binary vector field
vbase.create_index(
collection_name="my_collection",
field_name="binary_embedding",
index_name="binary_index",
index_type="BIN_FLAT",
metric_type="HAMMING",
params={},
)What do these fields mean?
index_type: set to"BIN_FLAT"metric_type: how distance is measured (HAMMINGorJACCARD)params: empty forBIN_FLAT(no extra tuning required)
Search using BIN_FLAT
Once the index exists and your data is inserted, search normally:
results = vbase.search(
collection_name="my_collection",
anns_field="binary_embedding",
data=[query_binary_vector],
limit=10,
search_params={"params": {}},
)
for hit in results[0]:
print(hit.id, hit.distance)Index parameters
BIN_FLAT has no index parameters to tune, both during:
- index creation (
params={}) - search (
search_params={"params": {}})
That simplicity is the main benefit—at the cost of speed for large datasets.
Practical notes
- If you see poor performance at scale, consider switching to an approximate binary index.
- If you’re evaluating other index types, keep one environment with
BIN_FLATas your “ground truth” baseline.