Skip to Content
We are live but in Staging 🎉
Knowledge BaseVector DatabaseSearch And SimilarityANN Search

A single-vector search is the most common way to use a vector database: you provide one embedding (your query), and Dodil VBase returns the top-K most similar vectors from a collection.

This is an Approximate Nearest Neighbor (ANN) search by default, meaning VBase relies on your collection’s index to return results fast at scale. The exact speed/recall tradeoff depends on the index type and its parameters.

What you need before searching

  • Python 3.10+
  • pip install dodil
  • A VBase database and an existing collection with a vector field (for example: vector)

If you haven’t connected to VBase yet, follow the connection guide first.

In this example, we search the quick_setup collection using one query embedding. We ask for the top 3 results and use Inner Product (IP) as the similarity metric.

from dodil import Client from dodil.vbase import VBaseConfig # Authorize with your Dodil service account c = Client( service_account_id="...", service_account_secret="...", ) # Connect to your VBase database vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) ) # One query embedding (same dimension as your collection's vector field) query_vector = [ 0.35803764, -0.6023496, 0.18414013, -0.26286206, 0.90294385, ] # Single-vector search (top-K) results = vbase.search( collection_name="quick_setup", anns_field="vector", data=[query_vector], limit=3, search_params={"metric_type": "IP"}, ) # results is typically a list (one entry per input vector) for hits in results: for hit in hits: # Common fields you will see: # - id: primary key of the matched entity # - distance/score: similarity value (meaning depends on metric) # - entity: selected output fields (if requested) print(hit)

Understanding the response

You’ll usually get one list of hits per input vector. Since this is a single-vector search (we passed a list containing one vector), you’ll receive a list with one “hits” list.

Each hit contains:

  • id: the primary key of the matched entity
  • distance / score: the similarity value
    • With IP and COSINE, higher typically means “more similar”.
    • With L2, lower means “closer / more similar”.
  • entity: the extra fields you requested via output_fields (optional)

Practical tips

  • Metric must match your index and data. If your collection was indexed with a specific metric, search using the same metric.
  • Top-K controls cost. Larger limit means more work. Start small (e.g., 5–20) and increase only if needed.
  • Use filters when you can. If your schema has scalar fields (like tenant_id, category, created_at), adding a filter can improve both relevance and performance.
  • Index + load matters. For large collections, make sure you have an index built and the collection is loaded (see Load and Release). Without a suitable index, searches can be slow.
Last updated on