Range search

Range search returns all matches whose distance (or similarity score) falls inside a range around your query vector.

This is different from the common TopK search (“give me the closest 10 results”). With range search, you’re saying:

Outer boundary: “Only return vectors up to this far away” (radius)
Optional inner boundary: “Also exclude anything too close” (range_filter)

That makes range search useful when you care about thresholds, not ranking.

When to use range search

Range search is a good fit for:

Near-duplicate detection: return items within a tight similarity band.
“Good enough” matches: return everything above a similarity threshold, then apply custom logic.
Novelty filtering: exclude “too similar” items by setting an inner boundary.
Quality gates: reject results outside an acceptable score window.

Tip: Range search can return a lot of results. Always set a sensible limit (or paging strategy) to cap output.

How range search works

You provide a query vector, and VBase searches the collection’s vector index. Instead of returning the best top_k matches, it returns matches whose distance/score is within your configured bounds.

radius defines the outer boundary of the search space.
range_filter (optional) defines the inner boundary.

So you can express:

Only close matches: distance < radius (or score > radius, depending on metric)
A similarity band: range_filter <= distance < radius (or radius < score <= range_filter)

Parameters

Most SDKs represent these as part of search_params (under params).

radius: required. The outer boundary.
range_filter: optional. The inner boundary (to exclude the closest results).

Metric-specific rules

Milvus metrics do not all behave the same:

For distance-based metrics (smaller is better), “close” means small distance.
For similarity-based metrics (larger is better), “close” means large score.

Use the following rules when defining radius and range_filter:

Metric type	What “better” means	Include results that satisfy
`L2`	smaller distance is better	`range_filter <= distance < radius`
`JACCARD`	smaller distance is better	`range_filter <= distance < radius`
`HAMMING`	smaller distance is better	`range_filter <= distance < radius`
`IP`	larger score is better	`radius < score <= range_filter`
`COSINE`	larger score is better	`radius < score <= range_filter`

If you only want a single threshold (no band), you can omit range_filter and treat radius as your threshold.

Example: range search in Dodil VBase

Below is a simple example that searches a collection using COSINE similarity and returns only matches within a similarity band.


from dodil import Client
from dodil.vbase import VBaseConfig
 
# Authorize
c = Client(
    service_account_id="...",
    service_account_secret="...",
)
 
# Connect
vbase = c.vbase.connect(
    VBaseConfig(
        host="vbase-db-<id>.infra.dodil.cloud",
        port=443,
        scheme="https",
        db_name="db_<id>",
    )
)
 
query_vector = [0.12, 0.98, 0.03, ...]  # same dimension as your collection
 
results = vbase.search(
    collection_name="movies",
    data=[query_vector],
    limit=50,
    search_params={
        "metric_type": "COSINE",
        "params": {
            # Keep results with similarity score within (0.65, 0.80]
            "radius": 0.65,
            "range_filter": 0.80,
        },
    },
    # Optional: apply a scalar filter at the same time
    filter='language == "en"',
    output_fields=["title", "year", "language"],
)
 
for hit in results[0]:
    print(hit["id"], hit["score"], hit.get("title"))

Notes

Even with range search, you should still pass a limit to cap the number of returned matches.
If you set bounds incorrectly for your metric (for example, using distance-style inequalities for COSINE), you may get zero results.
For large collections, range search is more predictable when you have a well-built index and a sensible threshold band.

Range search vs filtered search

Range search controls vector similarity bounds (radius / range_filter).

Filtered search controls metadata constraints (e.g. tenant_id, category, created_at).

You can combine both:

Use range search to keep only “close enough” vectors
Use filters to keep only “valid” candidates (tenant, permissions, time window, etc.)

This is a common pattern for production workloads.