Collection, Data, and Index Commands
Last validated: 2026-05-11
These commands operate against the selected tenant endpoint (vbase.host:vbase.port) after db use.
For a fast one-page chooser, see 06-metric-and-index-selection-cheat-sheet.md.
Prerequisites
- Authenticate via Dodil CLI login.
- Select service context via
dodil vbase db use <service_id>. - Confirm current
db_name(or pass--dbexplicitly).
Collection Commands
collection create <collection_name>
Use cases:
- quick default schema creation
- explicit custom schema creation for production fields
Flags (default-schema mode):
| Flag | Default | Description |
|---|---|---|
--db | active config db | Target database name. |
--dynamic-fields | true | Enable dynamic fields on collection schema. |
--id-field | id | Primary key field name. |
--id-type | varchar | Primary key type (varchar or int64). |
--id-max-length | 64 | Required for varchar primary key. |
--vector-field | vector | Vector field name. |
--vector-type | float_vector | Vector type (float_vector or binary_vector). |
--dim | 8 | Vector dimension. |
Flags (custom-schema mode):
| Flag | Description |
|---|---|
--field | Repeatable custom field spec: name:type[:pk][:key=value,...]. |
Custom field examples:
doc_id:varchar:pk:max_length=64embedding:float_vector:dim=768category:varchar:max_length=32
Examples:
Default-schema mode:
dodil vbase collection create docs \
--id-field doc_id \
--id-type varchar \
--id-max-length 128 \
--vector-field embedding \
--vector-type float_vector \
--dim 768 \
--db <db_name>Custom-schema mode:
dodil vbase collection create docs_custom \
--field "doc_id:varchar:pk:max_length=64" \
--field "embedding:float_vector:dim=768" \
--field "category:varchar:max_length=32" \
--db <db_name>Validation behavior:
- exactly one custom field must be marked
pk - vector custom fields require
dim - default schema requires
--dim > 0
Other collection commands
| Command | Syntax | Use case |
|---|---|---|
| List | dodil vbase collection list [--db <db_name>] | View collection names. |
| Show | dodil vbase collection show <name> [--db <db_name>] | Inspect collection and indexes. |
| Drop | dodil vbase collection drop <name> [--db <db_name>] | Delete collection. |
| Load | dodil vbase collection load <name> [--db <db_name>] | Load into memory. |
| Release | dodil vbase collection release <name> [--db <db_name>] | Release from memory. |
Data Commands
data insert <collection_name>
Flags:
| Flag | Required | Description |
|---|---|---|
--db | No | Target DB; defaults to active context. |
--id | Yes | Record ID value. |
--id-field | No | ID field name (default id). |
--vector | Yes | Comma-separated float values. |
--vector-field | No | Vector field name (default vector). |
Example:
dodil vbase data insert docs_custom \
--id doc-1 \
--id-field doc_id \
--vector-field embedding \
--vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" \
--db <db_name>data search <collection_name>
Flags:
| Flag | Required | Description |
|---|---|---|
--db | No | Target DB; defaults to active context. |
--vector | Yes | Query vector (comma-separated floats). |
--vector-field | No | Vector field to search (anns_field is set from this). |
--id-field | No | ID field returned in output. |
--topk | No | Result count (default 10). |
Example:
dodil vbase data search docs_custom \
--vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \
--vector-field embedding \
--id-field doc_id \
--topk 5 \
--db <db_name>Current behavior notes:
- search metric is fixed to
L2in current command implementation - text-to-embedding is not part of this command
Similarity and Metric Fundamentals
Let query vector be $q \in \mathbb{R}^d$ and candidate vector be $x \in \mathbb{R}^d$.
L2 (Euclidean distance)
$$ d_{L2}(q, x) = \lVert q - x \rVert_2 = \sqrt{\sum_{i=1}^{d}(q_i - x_i)^2} $$
- Lower is better.
- Sensitive to both direction and magnitude.
IP (Inner Product)
$$ s_{IP}(q, x) = q \cdot x = \sum_{i=1}^{d} q_i x_i $$
- Higher is better.
- Mixes angle and vector length effects.
COSINE (Angular similarity)
$$ s_{cos}(q, x) = \frac{q \cdot x}{\lVert q \rVert_2 \lVert x \rVert_2} $$
- Higher is better.
- Focuses on direction, less on magnitude.
Why normalization matters
If vectors are unit-normalized ($\lVert q \rVert_2 = \lVert x \rVert_2 = 1$), then:
$$ \lVert q - x \rVert_2^2 = 2 - 2,s_{cos}(q, x) $$
This means ranking by smallest L2 distance is equivalent to ranking by largest cosine similarity on normalized vectors.
Practical note for current CLI:
dodil vbase data searchcurrently sendsmetric_type=L2.- If you need native
COSINEorIPsearch behavior, use RunCommand fallback and setsearch_params.metric_typeexplicitly. - RunCommand workflow: ../06-workflows/04-run-command-fallback-grpc-http.md
Index Commands
index create <collection_name> <field_name>
Flags:
| Flag | Default | Description |
|---|---|---|
--db | active config db | Target DB name. |
--type | FLAT | Index type (FLAT, HNSW, IVF_FLAT, IVF_SQ8, etc.). |
--metric | L2 | Metric type (L2, IP, COSINE). |
Index Type Primer
| Index type | Exact vs Approx | Good default use case | Trade-off |
|---|---|---|---|
FLAT | Exact | Small/medium datasets, quality baselines | Highest latency at scale |
HNSW | Approximate | General low-latency ANN workloads | More memory than IVF families |
IVF_FLAT | Approximate | Large datasets with balanced recall/latency | Requires IVF tuning for best quality |
IVF_SQ8 | Approximate + quantized | Very large datasets where memory pressure matters | Lower memory, more approximation loss |
Quick mental model:
FLAT: compare with almost everything.HNSW: graph shortcuts to neighbors.IVF_*: coarse partition first, then search likely partitions.
CLI tuning limitation:
- Current CLI exposes
--typeand--metriconly. - Advanced knobs (for example IVF probe/count settings or HNSW depth knobs) are not exposed in first-class CLI flags.
- Use RunCommand for advanced index/search tuning.
Example:
dodil vbase index create docs_custom embedding --type HNSW --metric L2 --db <db_name>index drop <collection_name> <field_name>
Example:
dodil vbase index drop docs_custom embedding --db <db_name>Alignment Rules That Prevent Failures
- Use matching field names across collection schema, index creation, insert, and search.
- If your schema uses custom names, always pass
--id-fieldand--vector-fieldon data commands. - Keep vector dimensionality consistent between collection schema and insert/search payloads.
- Keep index/search metric semantics aligned. For first-class CLI search, prefer
--metric L2on index creation. - If your workload needs
COSINEorIPat search time, use RunCommand fallback for explicit metric control.