Single-vector search

A single-vector search is the most common way to use a vector database: you provide one embedding (your query), and Dodil VBase returns the top-K most similar vectors from a collection.

This is an Approximate Nearest Neighbor (ANN) search by default, meaning VBase relies on your collection’s index to return results fast at scale. The exact speed/recall tradeoff depends on the index type and its parameters.

What you need before searching

Python 3.10+
pip install dodil
A VBase database and an existing collection with a vector field (for example: vector)

If you haven’t connected to VBase yet, follow the connection guide first.

Run a single-vector search

In this example, we search the quick_setup collection using one query embedding. We ask for the top 3 results and use Inner Product (IP) as the similarity metric.


from dodil import Client
from dodil.vbase import VBaseConfig
 
# Authorize with your Dodil service account
c = Client(
    service_account_id="...",
    service_account_secret="...",
)
 
# Connect to your VBase database
vbase = c.vbase.connect(
    VBaseConfig(
        host="vbase-db-<id>.infra.dodil.cloud",
        port=443,
        scheme="https",
        db_name="db_<id>",
    )
)
 
# One query embedding (same dimension as your collection's vector field)
query_vector = [
    0.35803764,
    -0.6023496,
    0.18414013,
    -0.26286206,
    0.90294385,
]
 
# Single-vector search (top-K)
results = vbase.search(
    collection_name="quick_setup",
    anns_field="vector",
    data=[query_vector],
    limit=3,
    search_params={"metric_type": "IP"},
)
 
# results is typically a list (one entry per input vector)
for hits in results:
    for hit in hits:
        # Common fields you will see:
        # - id: primary key of the matched entity
        # - distance/score: similarity value (meaning depends on metric)
        # - entity: selected output fields (if requested)
        print(hit)

Understanding the response

You’ll usually get one list of hits per input vector. Since this is a single-vector search (we passed a list containing one vector), you’ll receive a list with one “hits” list.

Each hit contains:

id: the primary key of the matched entity
distance / score: the similarity value
- With IP and COSINE, higher typically means “more similar”.
- With L2, lower means “closer / more similar”.
entity: the extra fields you requested via output_fields (optional)

Practical tips

Metric must match your index and data. If your collection was indexed with a specific metric, search using the same metric.
Top-K controls cost. Larger limit means more work. Start small (e.g., 5–20) and increase only if needed.
Use filters when you can. If your schema has scalar fields (like tenant_id, category, created_at), adding a filter can improve both relevance and performance.
Index + load matters. For large collections, make sure you have an index built and the collection is loaded (see Load and Release). Without a suitable index, searches can be slow.