Skip to Content
We are live but in Staging 🎉
WorkflowsWorkflow: Collection, Index, and Search (CLI)

Workflow: Collection, Index, and Search (CLI)

Last validated: 2026-05-11

Use this workflow to create vector schema, index it, ingest data, and run similarity search.

Need a compact reference first? See ../05-cli-guide/06-metric-and-index-selection-cheat-sheet.md.

Prerequisites

  1. dodil login <service_account_id> <service_account_secret>
  2. dodil vbase db use <service_id>

Step 0: Choose Metric and Index Quickly

Current CLI behavior to remember first:

Fast chooser

If your goal is…Start withWhy
Best correctness baselineFLAT + L2Exact search and easiest debugging
Good latency/recall balanceHNSW + L2Practical ANN default
Very large datasetIVF_FLAT + L2Faster search via coarse partitioning
Large dataset with tighter memoryIVF_SQ8 + L2Lower memory footprint at some quality cost

Metric intuition (minimal math)

For vectors $q$ (query) and $x$ (candidate):

  • L2 distance: $d_{L2}(q,x)=\lVert q-x \rVert_2$ (smaller is better)
  • Inner product: $s_{IP}(q,x)=q\cdot x$ (larger is better)
  • Cosine similarity: $s_{cos}(q,x)=\frac{q\cdot x}{\lVert q\rVert_2\lVert x\rVert_2}$ (larger is better)

If vectors are unit-normalized, then $\lVert q-x\rVert_2^2 = 2 - 2s_{cos}(q,x)$, so L2 and cosine produce the same ranking.

Path A: Default Schema Flow

Step A1: Create collection

dodil vbase collection create docs \ --id-field id \ --id-type varchar \ --id-max-length 64 \ --vector-field vector \ --vector-type float_vector \ --dim 8 \ --db <db_name>

Step A2: Create index

dodil vbase index create docs vector \ --type HNSW \ --metric L2 \ --db <db_name>

Step A3: Insert records

dodil vbase data insert docs --id doc-1 --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" --db <db_name> dodil vbase data insert docs --id doc-2 --vector "0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88" --db <db_name>
dodil vbase data search docs \ --vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \ --topk 5 \ --db <db_name>

Path B: Custom Schema Flow

Use this path when field names are not id and vector.

Step B1: Create collection with custom field specs

dodil vbase collection create docs_custom \ --field "doc_id:varchar:pk:max_length=64" \ --field "embedding:float_vector:dim=8" \ --field "category:varchar:max_length=32" \ --db <db_name>

Step B2: Create index on custom vector field

dodil vbase index create docs_custom embedding --type HNSW --metric L2 --db <db_name>

Step B3: Insert rows using matching field flags

dodil vbase data insert docs_custom \ --id doc-1 \ --id-field doc_id \ --vector-field embedding \ --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" \ --db <db_name>

Step B4: Search using matching field flags

dodil vbase data search docs_custom \ --id-field doc_id \ --vector-field embedding \ --vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \ --topk 5 \ --db <db_name>

Validation and Inspection

dodil vbase collection show docs_custom --db <db_name> dodil vbase collection list --db <db_name>

Why These Arguments Matter

  • --vector-field: determines where search is executed.
  • --id-field: determines which field appears in result output.
  • --dim: must match vector length at insert/search time.
  • --metric: should match your intended search metric semantics.
  • For current first-class CLI search, L2 is the safest aligned choice.
  • For explicit COSINE or IP search behavior, switch to RunCommand workflow.