Get & Scalar Query

Overview

In VBase, there are two common ways to retrieve entities without doing a vector similarity search:

Get: fetch entities by their primary key values (fast lookup when you already know the IDs).
Scalar Query: fetch entities by filtering on non-vector fields (numbers, strings, booleans, timestamps, etc.).

Both methods return the matching entities, and you can choose which fields to return via output_fields.

Get entities by primary key

Use Get when you already have the primary keys and just want to fetch the full entities.

Typical use cases:

hydrate an API response after you stored only IDs
fetch a batch of entities for evaluation/debugging
re-check the stored metadata for a known set of IDs


res = vbase.get(
    collection_name="my_collection",
    ids=[10, 11, 12],
    output_fields=["color", "vector"],
)
 
for row in res:
    print(row)

Get from a specific partition

If you’re using partitions, you can restrict the lookup:


res = vbase.get(
    collection_name="my_collection",
    partition_names=["partitionA"],
    ids=[10, 11, 12],
    output_fields=["color"],
)

Scalar Query (filter by non-vector fields)

Use Query when you want to select entities by conditions on scalar fields.

For example, assume your collection has fields like:

id (primary key)
vector (embedding)
color (string)
price (number)
in_stock (boolean)

Basic query

The example below returns up to 3 entities where the color field starts with red.


res = vbase.query(
    collection_name="my_collection",
    filter='color like "red%"',
    output_fields=["id", "color", "vector"],
    limit=3,
)
 
for row in res:
    print(row)

Common filter patterns

These expressions are written as strings.


# Equality
filter='color == "blue"'
 
# Boolean
filter='in_stock == true'
 
# Ranges
filter='price >= 10 and price < 50'
 
# Set membership
filter='color in ["red", "green", "blue"]'
 
# Prefix match
filter='color like "red%"'
 
# Combine conditions
filter='in_stock == true and price <= 100'

Query within partitions


res = vbase.query(
    collection_name="my_collection",
    partition_names=["partitionA"],
    filter='color like "red%"',
    output_fields=["id", "color"],
    limit=100,
)

Paging large result sets (Query Iterator)

If your query may return many rows, iterating in batches is safer than requesting everything at once.

A query iterator pattern:


it = vbase.query_iterator(
    collection_name="my_collection",
    filter='color like "red%"',
    output_fields=["id", "color"],
    batch_size=1000,
)
 
try:
    for batch in it:
        for row in batch:
            # process row
            pass
finally:
    it.close()

Tip: Use an iterator for backfills, exports, audits, or any workflow where you don’t know the result size upfront.

Random sampling with Query

To fetch a representative subset of a collection (for exploration, evaluation, or quick tests), you can sample during a query:


# Sample ~1% of the collection
res = vbase.query(
    collection_name="my_collection",
    filter="RANDOM_SAMPLE(0.01)",
    output_fields=["id", "color"],
)
 
print("sample size:", len(res))

You can also combine sampling with other filters:


res = vbase.query(
    collection_name="my_collection",
    filter='color like "red%" and RANDOM_SAMPLE(0.005)',
    output_fields=["id", "color"],
    limit=10,
)

Get vs Query

Use Get when you already know the IDs.
Use Query when you need filtering rules.
Use Query Iterator when the result size can be large.

If you need nearest-neighbor similarity search, use a vector search (covered in the Search docs).