Workflow: Collection, Index, and Search (CLI)
Last validated: 2026-05-11
Use this workflow to create vector schema, index it, ingest data, and run similarity search.
Need a compact reference first? See ../05-cli-guide/06-metric-and-index-selection-cheat-sheet.md.
Prerequisites
dodil login <service_account_id> <service_account_secret>dodil vbase db use <service_id>
Step 0: Choose Metric and Index Quickly
Current CLI behavior to remember first:
dodil vbase data searchcurrently usesL2metric.- If you need true
COSINEorIPsearch-time control, use RunCommand workflow: 04-run-command-fallback-grpc-http.md
Fast chooser
| If your goal is… | Start with | Why |
|---|---|---|
| Best correctness baseline | FLAT + L2 | Exact search and easiest debugging |
| Good latency/recall balance | HNSW + L2 | Practical ANN default |
| Very large dataset | IVF_FLAT + L2 | Faster search via coarse partitioning |
| Large dataset with tighter memory | IVF_SQ8 + L2 | Lower memory footprint at some quality cost |
Metric intuition (minimal math)
For vectors $q$ (query) and $x$ (candidate):
- L2 distance: $d_{L2}(q,x)=\lVert q-x \rVert_2$ (smaller is better)
- Inner product: $s_{IP}(q,x)=q\cdot x$ (larger is better)
- Cosine similarity: $s_{cos}(q,x)=\frac{q\cdot x}{\lVert q\rVert_2\lVert x\rVert_2}$ (larger is better)
If vectors are unit-normalized, then $\lVert q-x\rVert_2^2 = 2 - 2s_{cos}(q,x)$, so L2 and cosine produce the same ranking.
Path A: Default Schema Flow
Step A1: Create collection
dodil vbase collection create docs \
--id-field id \
--id-type varchar \
--id-max-length 64 \
--vector-field vector \
--vector-type float_vector \
--dim 8 \
--db <db_name>Step A2: Create index
dodil vbase index create docs vector \
--type HNSW \
--metric L2 \
--db <db_name>Step A3: Insert records
dodil vbase data insert docs --id doc-1 --vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" --db <db_name>
dodil vbase data insert docs --id doc-2 --vector "0.81,0.82,0.83,0.84,0.85,0.86,0.87,0.88" --db <db_name>Step A4: Search
dodil vbase data search docs \
--vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \
--topk 5 \
--db <db_name>Path B: Custom Schema Flow
Use this path when field names are not id and vector.
Step B1: Create collection with custom field specs
dodil vbase collection create docs_custom \
--field "doc_id:varchar:pk:max_length=64" \
--field "embedding:float_vector:dim=8" \
--field "category:varchar:max_length=32" \
--db <db_name>Step B2: Create index on custom vector field
dodil vbase index create docs_custom embedding --type HNSW --metric L2 --db <db_name>Step B3: Insert rows using matching field flags
dodil vbase data insert docs_custom \
--id doc-1 \
--id-field doc_id \
--vector-field embedding \
--vector "0.11,0.12,0.13,0.14,0.15,0.16,0.17,0.18" \
--db <db_name>Step B4: Search using matching field flags
dodil vbase data search docs_custom \
--id-field doc_id \
--vector-field embedding \
--vector "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8" \
--topk 5 \
--db <db_name>Validation and Inspection
dodil vbase collection show docs_custom --db <db_name>
dodil vbase collection list --db <db_name>Why These Arguments Matter
--vector-field: determines where search is executed.--id-field: determines which field appears in result output.--dim: must match vector length at insert/search time.--metric: should match your intended search metric semantics.- For current first-class CLI search,
L2is the safest aligned choice. - For explicit
COSINEorIPsearch behavior, switch to RunCommand workflow.