Workflow: Collection, Index, and Search (CLI)

Last validated: 2026-05-11

Use this workflow to create vector schema, index it, ingest data, and run similarity search.

Need a compact reference first? See ../05-cli-guide/06-metric-and-index-selection-cheat-sheet.md.

Prerequisites

dodil login <service_account_id> <service_account_secret>
dodil vbase db use <service_id>

Step 0: Choose Metric and Index Quickly

Current CLI behavior to remember first:

dodil vbase data search currently uses L2 metric.
If you need true COSINE or IP search-time control, use RunCommand workflow: 04-run-command-fallback-grpc-http.md

Fast chooser

If your goal is…	Start with	Why
Best correctness baseline	`FLAT + L2`	Exact search and easiest debugging
Good latency/recall balance	`HNSW + L2`	Practical ANN default
Very large dataset	`IVF_FLAT + L2`	Faster search via coarse partitioning
Large dataset with tighter memory	`IVF_SQ8 + L2`	Lower memory footprint at some quality cost

Metric intuition (minimal math)

For vectors $q$ (query) and $x$ (candidate):

L2 distance: $d_{L2}(q,x)=\lVert q-x \rVert_2$ (smaller is better)
Inner product: $s_{IP}(q,x)=q\cdot x$ (larger is better)
Cosine similarity: $s_{cos}(q,x)=\frac{q\cdot x}{\lVert q\rVert_2\lVert x\rVert_2}$ (larger is better)

If vectors are unit-normalized, then $\lVert q-x\rVert_2^2 = 2 - 2s_{cos}(q,x)$, so L2 and cosine produce the same ranking.

Path A: Default Schema Flow

Step A1: Create collection


dodil vbase collection create docs \
  --id-field id \
  --id-type varchar \
  --id-max-length 64 \
  --vector-field vector \
  --vector-type float_vector \
  --dim 8 \
  --db <db_name>

Step A2: Create index


dodil vbase index create docs vector \
  --type HNSW \
  --metric L2 \
  --db <db_name>

Step A3: Insert records


dodil vbase data insert docs --id doc-1 --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" --db <db_name>
dodil vbase data insert docs --id doc-2 --vector "0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88" --db <db_name>

Step A4: Search


dodil vbase data search docs \
  --vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \
  --topk 5 \
  --db <db_name>

Path B: Custom Schema Flow

Use this path when field names are not id and vector.

Step B1: Create collection with custom field specs


dodil vbase collection create docs_custom \
  --field "doc_id:varchar:pk:max_length=64" \
  --field "embedding:float_vector:dim=8" \
  --field "category:varchar:max_length=32" \
  --db <db_name>

Step B2: Create index on custom vector field


dodil vbase index create docs_custom embedding --type HNSW --metric L2 --db <db_name>

Step B3: Insert rows using matching field flags


dodil vbase data insert docs_custom \
  --id doc-1 \
  --id-field doc_id \
  --vector-field embedding \
  --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" \
  --db <db_name>

Step B4: Search using matching field flags


dodil vbase data search docs_custom \
  --id-field doc_id \
  --vector-field embedding \
  --vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \
  --topk 5 \
  --db <db_name>

Validation and Inspection


dodil vbase collection show docs_custom --db <db_name>
dodil vbase collection list --db <db_name>

Why These Arguments Matter

--vector-field: determines where search is executed.
--id-field: determines which field appears in result output.
--dim: must match vector length at insert/search time.
--metric: should match your intended search metric semantics.
For current first-class CLI search, L2 is the safest aligned choice.
For explicit COSINE or IP search behavior, switch to RunCommand workflow.