Collection, Data, and Index Commands

Last validated: 2026-05-11

These commands operate against the selected tenant endpoint (vbase.host:vbase.port) after db use.

For a fast one-page chooser, see 06-metric-and-index-selection-cheat-sheet.md.

Prerequisites

Authenticate via Dodil CLI login.
Select service context via dodil vbase db use <service_id>.
Confirm current db_name (or pass --db explicitly).

Collection Commands

`collection create <collection_name>`

Use cases:

quick default schema creation
explicit custom schema creation for production fields

Flags (default-schema mode):

Flag	Default	Description
`--db`	active config db	Target database name.
`--dynamic-fields`	`true`	Enable dynamic fields on collection schema.
`--id-field`	`id`	Primary key field name.
`--id-type`	`varchar`	Primary key type (`varchar` or `int64`).
`--id-max-length`	`64`	Required for varchar primary key.
`--vector-field`	`vector`	Vector field name.
`--vector-type`	`float_vector`	Vector type (`float_vector` or `binary_vector`).
`--dim`	`8`	Vector dimension.

Flags (custom-schema mode):

Flag	Description
`--field`	Repeatable custom field spec: `name:type[:pk][:key=value,...]`.

Custom field examples:

doc_id:varchar:pk:max_length=64
embedding:float_vector:dim=768
category:varchar:max_length=32

Examples:

Default-schema mode:


dodil vbase collection create docs \
  --id-field doc_id \
  --id-type varchar \
  --id-max-length 128 \
  --vector-field embedding \
  --vector-type float_vector \
  --dim 768 \
  --db <db_name>

Custom-schema mode:


dodil vbase collection create docs_custom \
  --field "doc_id:varchar:pk:max_length=64" \
  --field "embedding:float_vector:dim=768" \
  --field "category:varchar:max_length=32" \
  --db <db_name>

Validation behavior:

exactly one custom field must be marked pk
vector custom fields require dim
default schema requires --dim > 0

Other collection commands

Command	Syntax	Use case
List	`dodil vbase collection list [--db <db_name>]`	View collection names.
Show	`dodil vbase collection show <name> [--db <db_name>]`	Inspect collection and indexes.
Drop	`dodil vbase collection drop <name> [--db <db_name>]`	Delete collection.
Load	`dodil vbase collection load <name> [--db <db_name>]`	Load into memory.
Release	`dodil vbase collection release <name> [--db <db_name>]`	Release from memory.

Data Commands

`data insert <collection_name>`

Flags:

Flag	Required	Description
`--db`	No	Target DB; defaults to active context.
`--id`	Yes	Record ID value.
`--id-field`	No	ID field name (default `id`).
`--vector`	Yes	Comma-separated float values.
`--vector-field`	No	Vector field name (default `vector`).

Example:


dodil vbase data insert docs_custom \
  --id doc-1 \
  --id-field doc_id \
  --vector-field embedding \
  --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" \
  --db <db_name>

`data search <collection_name>`

Flags:

Flag	Required	Description
`--db`	No	Target DB; defaults to active context.
`--vector`	Yes	Query vector (comma-separated floats).
`--vector-field`	No	Vector field to search (`anns_field` is set from this).
`--id-field`	No	ID field returned in output.
`--topk`	No	Result count (default 10).

Example:


dodil vbase data search docs_custom \
  --vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \
  --vector-field embedding \
  --id-field doc_id \
  --topk 5 \
  --db <db_name>

Current behavior notes:

search metric is fixed to L2 in current command implementation
text-to-embedding is not part of this command

Similarity and Metric Fundamentals

Let query vector be $q \in \mathbb{R}^d$ and candidate vector be $x \in \mathbb{R}^d$.

L2 (Euclidean distance)

$$ d_{L2}(q, x) = \lVert q - x \rVert_2 = \sqrt{\sum_{i=1}^{d}(q_i - x_i)^2} $$

Lower is better.
Sensitive to both direction and magnitude.

IP (Inner Product)

$$ s_{IP}(q, x) = q \cdot x = \sum_{i=1}^{d} q_i x_i $$

Higher is better.
Mixes angle and vector length effects.

COSINE (Angular similarity)

$$ s_{cos}(q, x) = \frac{q \cdot x}{\lVert q \rVert_2 \lVert x \rVert_2} $$

Higher is better.
Focuses on direction, less on magnitude.

Why normalization matters

If vectors are unit-normalized ($\lVert q \rVert_2 = \lVert x \rVert_2 = 1$), then:

$$ \lVert q - x \rVert_2^2 = 2 - 2,s_{cos}(q, x) $$

This means ranking by smallest L2 distance is equivalent to ranking by largest cosine similarity on normalized vectors.

Practical note for current CLI:

dodil vbase data search currently sends metric_type=L2.
If you need native COSINE or IP search behavior, use RunCommand fallback and set search_params.metric_type explicitly.
RunCommand workflow: ../06-workflows/04-run-command-fallback-grpc-http.md

Index Commands

`index create <collection_name> <field_name>`

Flags:

Flag	Default	Description
`--db`	active config db	Target DB name.
`--type`	`FLAT`	Index type (`FLAT`, `HNSW`, `IVF_FLAT`, `IVF_SQ8`, etc.).
`--metric`	`L2`	Metric type (`L2`, `IP`, `COSINE`).

Index Type Primer

Index type	Exact vs Approx	Good default use case	Trade-off
`FLAT`	Exact	Small/medium datasets, quality baselines	Highest latency at scale
`HNSW`	Approximate	General low-latency ANN workloads	More memory than IVF families
`IVF_FLAT`	Approximate	Large datasets with balanced recall/latency	Requires IVF tuning for best quality
`IVF_SQ8`	Approximate + quantized	Very large datasets where memory pressure matters	Lower memory, more approximation loss

Quick mental model:

FLAT: compare with almost everything.
HNSW: graph shortcuts to neighbors.
IVF_*: coarse partition first, then search likely partitions.

CLI tuning limitation:

Current CLI exposes --type and --metric only.
Advanced knobs (for example IVF probe/count settings or HNSW depth knobs) are not exposed in first-class CLI flags.
Use RunCommand for advanced index/search tuning.

Example:


dodil vbase index create docs_custom embedding --type HNSW --metric L2 --db <db_name>

`index drop <collection_name> <field_name>`

Example:


dodil vbase index drop docs_custom embedding --db <db_name>

Alignment Rules That Prevent Failures

Use matching field names across collection schema, index creation, insert, and search.
If your schema uses custom names, always pass --id-field and --vector-field on data commands.
Keep vector dimensionality consistent between collection schema and insert/search payloads.
Keep index/search metric semantics aligned. For first-class CLI search, prefer --metric L2 on index creation.
If your workload needs COSINE or IP at search time, use RunCommand fallback for explicit metric control.