BIN_FLAT

BIN_FLAT is the simplest index you can use for binary vector fields. It performs an exhaustive scan: for every query vector, it compares against every stored vector and returns the exact nearest neighbors.

That means:

✅ Perfect recall (exact results)
✅ Great as a baseline / benchmark to compare other indexes
❌ The slowest option as your dataset grows (work scales linearly with the number of vectors)

If you care about absolute accuracy and your dataset is still relatively small (or you’re validating quality), BIN_FLAT is a solid choice.

When should I use it?

Use BIN_FLAT when you want:

A simple and reliable starting point for binary search
A correctness baseline before switching to faster approximate indexes
Small-to-medium binary vector datasets where latency is still acceptable

If you’re operating at large scale and need low latency, you’ll typically move to an approximate binary index (for example BIN_IVF_FLAT).

Supported distance metrics

For binary vectors, BIN_FLAT commonly supports:

HAMMING (default in many setups)
JACCARD

Pick the metric that matches how your binary embeddings are defined.

Create the index in Dodil

Below is a typical flow using the Dodil SDK. (Assumes you already have a connected vbase client.)


# Create a BIN_FLAT index on a binary vector field
vbase.create_index(
    collection_name="my_collection",
    field_name="binary_embedding",
    index_name="binary_index",
    index_type="BIN_FLAT",
    metric_type="HAMMING",
    params={},
)

What do these fields mean?

index_type: set to "BIN_FLAT"
metric_type: how distance is measured (HAMMING or JACCARD)
params: empty for BIN_FLAT (no extra tuning required)

Search using BIN_FLAT

Once the index exists and your data is inserted, search normally:


results = vbase.search(
    collection_name="my_collection",
    anns_field="binary_embedding",
    data=[query_binary_vector],
    limit=10,
    search_params={"params": {}},
)
 
for hit in results[0]:
    print(hit.id, hit.distance)

Index parameters

BIN_FLAT has no index parameters to tune, both during:

index creation (params={})
search (search_params={"params": {}})

That simplicity is the main benefit—at the cost of speed for large datasets.

Practical notes

If you see poor performance at scale, consider switching to an approximate binary index.
If you’re evaluating other index types, keep one environment with BIN_FLAT as your “ground truth” baseline.