Skip to Content
We are live but in Staging 🎉
CLI GuideCollection, Data, and Index Commands

Collection, Data, and Index Commands

Last validated: 2026-05-11

These commands operate against the selected tenant endpoint (vbase.host:vbase.port) after db use.

For a fast one-page chooser, see 06-metric-and-index-selection-cheat-sheet.md.

Prerequisites

  1. Authenticate via Dodil CLI login.
  2. Select service context via dodil vbase db use <service_id>.
  3. Confirm current db_name (or pass --db explicitly).

Collection Commands

collection create <collection_name>

Use cases:

  • quick default schema creation
  • explicit custom schema creation for production fields

Flags (default-schema mode):

FlagDefaultDescription
--dbactive config dbTarget database name.
--dynamic-fieldstrueEnable dynamic fields on collection schema.
--id-fieldidPrimary key field name.
--id-typevarcharPrimary key type (varchar or int64).
--id-max-length64Required for varchar primary key.
--vector-fieldvectorVector field name.
--vector-typefloat_vectorVector type (float_vector or binary_vector).
--dim8Vector dimension.

Flags (custom-schema mode):

FlagDescription
--fieldRepeatable custom field spec: name:type[:pk][:key=value,...].

Custom field examples:

  • doc_id:varchar:pk:max_length=64
  • embedding:float_vector:dim=768
  • category:varchar:max_length=32

Examples:

Default-schema mode:

dodil vbase collection create docs \ --id-field doc_id \ --id-type varchar \ --id-max-length 128 \ --vector-field embedding \ --vector-type float_vector \ --dim 768 \ --db <db_name>

Custom-schema mode:

dodil vbase collection create docs_custom \ --field "doc_id:varchar:pk:max_length=64" \ --field "embedding:float_vector:dim=768" \ --field "category:varchar:max_length=32" \ --db <db_name>

Validation behavior:

  • exactly one custom field must be marked pk
  • vector custom fields require dim
  • default schema requires --dim > 0

Other collection commands

CommandSyntaxUse case
Listdodil vbase collection list [--db <db_name>]View collection names.
Showdodil vbase collection show <name> [--db <db_name>]Inspect collection and indexes.
Dropdodil vbase collection drop <name> [--db <db_name>]Delete collection.
Loaddodil vbase collection load <name> [--db <db_name>]Load into memory.
Releasedodil vbase collection release <name> [--db <db_name>]Release from memory.

Data Commands

data insert <collection_name>

Flags:

FlagRequiredDescription
--dbNoTarget DB; defaults to active context.
--idYesRecord ID value.
--id-fieldNoID field name (default id).
--vectorYesComma-separated float values.
--vector-fieldNoVector field name (default vector).

Example:

dodil vbase data insert docs_custom \ --id doc-1 \ --id-field doc_id \ --vector-field embedding \ --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" \ --db <db_name>

data search <collection_name>

Flags:

FlagRequiredDescription
--dbNoTarget DB; defaults to active context.
--vectorYesQuery vector (comma-separated floats).
--vector-fieldNoVector field to search (anns_field is set from this).
--id-fieldNoID field returned in output.
--topkNoResult count (default 10).

Example:

dodil vbase data search docs_custom \ --vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \ --vector-field embedding \ --id-field doc_id \ --topk 5 \ --db <db_name>

Current behavior notes:

  • search metric is fixed to L2 in current command implementation
  • text-to-embedding is not part of this command

Similarity and Metric Fundamentals

Let query vector be $q \in \mathbb{R}^d$ and candidate vector be $x \in \mathbb{R}^d$.

L2 (Euclidean distance)

$$ d_{L2}(q, x) = \lVert q - x \rVert_2 = \sqrt{\sum_{i=1}^{d}(q_i - x_i)^2} $$

  • Lower is better.
  • Sensitive to both direction and magnitude.

IP (Inner Product)

$$ s_{IP}(q, x) = q \cdot x = \sum_{i=1}^{d} q_i x_i $$

  • Higher is better.
  • Mixes angle and vector length effects.

COSINE (Angular similarity)

$$ s_{cos}(q, x) = \frac{q \cdot x}{\lVert q \rVert_2 \lVert x \rVert_2} $$

  • Higher is better.
  • Focuses on direction, less on magnitude.

Why normalization matters

If vectors are unit-normalized ($\lVert q \rVert_2 = \lVert x \rVert_2 = 1$), then:

$$ \lVert q - x \rVert_2^2 = 2 - 2,s_{cos}(q, x) $$

This means ranking by smallest L2 distance is equivalent to ranking by largest cosine similarity on normalized vectors.

Practical note for current CLI:

Index Commands

index create <collection_name> <field_name>

Flags:

FlagDefaultDescription
--dbactive config dbTarget DB name.
--typeFLATIndex type (FLAT, HNSW, IVF_FLAT, IVF_SQ8, etc.).
--metricL2Metric type (L2, IP, COSINE).

Index Type Primer

Index typeExact vs ApproxGood default use caseTrade-off
FLATExactSmall/medium datasets, quality baselinesHighest latency at scale
HNSWApproximateGeneral low-latency ANN workloadsMore memory than IVF families
IVF_FLATApproximateLarge datasets with balanced recall/latencyRequires IVF tuning for best quality
IVF_SQ8Approximate + quantizedVery large datasets where memory pressure mattersLower memory, more approximation loss

Quick mental model:

  • FLAT: compare with almost everything.
  • HNSW: graph shortcuts to neighbors.
  • IVF_*: coarse partition first, then search likely partitions.

CLI tuning limitation:

  • Current CLI exposes --type and --metric only.
  • Advanced knobs (for example IVF probe/count settings or HNSW depth knobs) are not exposed in first-class CLI flags.
  • Use RunCommand for advanced index/search tuning.

Example:

dodil vbase index create docs_custom embedding --type HNSW --metric L2 --db <db_name>

index drop <collection_name> <field_name>

Example:

dodil vbase index drop docs_custom embedding --db <db_name>

Alignment Rules That Prevent Failures

  • Use matching field names across collection schema, index creation, insert, and search.
  • If your schema uses custom names, always pass --id-field and --vector-field on data commands.
  • Keep vector dimensionality consistent between collection schema and insert/search payloads.
  • Keep index/search metric semantics aligned. For first-class CLI search, prefer --metric L2 on index creation.
  • If your workload needs COSINE or IP at search time, use RunCommand fallback for explicit metric control.