A schema is the blueprint for a collection in VBase. It answers two practical questions:
- What do you want to store? (vectors + metadata)
- How do you want to query it? (similarity search + filters)
In DODIL, you define a schema by choosing a set of fields.
- Primary key field: uniquely identifies each record.
- Vector field(s): the embedding(s) you search by similarity.
- Scalar fields: metadata you filter, sort, or return (strings, numbers, booleans, JSON, arrays, etc.).
Tip: Think of a collection like a database table. A schema is the table definition.
A quick mental model
When you search, VBase typically does two things:
- Vector similarity to find the closest embeddings.
- Metadata filtering to narrow results (for example: only documents from
workspace="acme"orlang="en").
Thatâs why the schema matters: good field choices make your search faster, cheaper, and easier to reason about. îciteîturn2view0î
Field types youâll use most
Primary key
A primary key is required. Itâs the stable identifier of every row.
Common choices:
- Integer IDs if your data is generated internally.
- String IDs if you already have natural identifiers (document ID, UUID, etc.).
Milvus-style schemas support configuring the primary field as auto-generated (auto ID) or user-supplied. îciteîturn2view0îturn9view0î
Vector fields
A vector field stores the embedding used for similarity search.
Youâll typically define:
- vector dimension (e.g. 768, 1024, 1536)
- metric (COSINE / IP / L2 depending on your embedding model)
Milvus-compatible schemas support multiple vector fields in one collection (useful for hybrid or multi-embedding designs). îciteîturn11view0î
Scalar fields (metadata)
Scalar fields hold the metadata youâll filter by or return in results.
Common examples:
source(string)workspace_id(string)created_at(timestamp or integer)lang(string)is_public(boolean)
Milvus schemas treat these as scalar fields and support common categories like string, number, boolean, and composite types. îciteîturn11view0îturn2view0î
Strings, numbers, booleans
String fields
Strings are great for identifiers and small categorical metadata.
Most engines require a max length for string fields (for example, 512). Keep it reasonable to avoid wasting memory. îciteîturn2view0îturn9view0î
Number fields
Numbers are ideal for:
- timestamps (unix seconds/ms)
- counters
- ranking features
Typical numeric field types include integers and floats. îciteîturn9view0îturn2view0î
Boolean fields
Use booleans for simple on/off flags like is_public, is_archived, or has_images. îciteîturn9view0îturn2view0î
Composite fields: JSON and arrays
JSON fields
Use JSON when your metadata is flexible and not known ahead of time (for example, attributes that differ per document type).
JSON is great for storage and returning metadata, but be intentional with how you filter on it (itâs usually not as fast as filtering on dedicated scalar fields). îciteîturn9view0îturn2view0î
Array fields
Use arrays for small lists such as tags:
tags: ["finance", "invoice", "q4"]
Array fields require you to define:
- the element type (e.g. string or int)
- the max capacity (maximum elements)
- element constraints such as max string length when elements are strings îciteîturn12view4î
Nullable and defaults
If a field wonât exist on every record, configure it as nullable or provide a default value. This keeps ingestion resilient when some inputs are missing optional metadata. îciteîturn9view0îturn2view0î
A practical schema example
Hereâs a schema that works well for most RAG-style apps (docs, web pages, tickets, PDFs):
id(primary key)vector(embedding)doc_id(string)chunk_id(string)source(string)workspace_id(string)created_at(int64)is_public(bool)attributes(json)tags(array of strings)
This gives you:
- fast similarity search (vector)
- clean filtering (workspace, source, access flags)
- flexibility (attributes JSON)
- lightweight tagging (tags array)
How this looks in the DODIL Python SDK
Below is the standard way to connect to VBase (Python 3.10+):
from dodil import Client
from dodil.vbase import VBaseConfig
# Authorize with a Service Account
c = Client(
service_account_id="...",
service_account_secret="...",
)
vbase = c.vbase.connect(
VBaseConfig(
host="vbase-db-<id>.infra.dodil.cloud",
port=443,
scheme="https",
db_name="db_<id>",
)
)
print(vbase.list_collections())Next: once you have a schema, you can create a collection and start inserting vectors.