Quickstart

Five minutes from here you’ll have a vector collection with one indexed document and a working search that returns ranked chunks.

We’ll go the pipeline-collection route — easiest, no embedding setup on your side. K3 ships text/code/visual embedding templates from Scriptum; pick one and K3 chunks + embeds + indexes for you. (For BYO-embeddings, see Recipes → External Collection.)

Prerequisites

dodil CLI installed and dodil login done — CLI Basics

A bucket — kb-prod:


dodil k3 bucket create kb-prod -d "RAG corpus"

1. Configure the vector engine

Every bucket needs its vector engine configured once before you can create collections. The simplest path is auto mode — K3 provisions a VBase database on a managed cluster for you:


dodil k3 vector store create -b kb-prod -m auto

Verify it landed:


dodil k3 vector store get -b kb-prod -o json | jq '{status, mode, vbaseEndpoint, vbaseDbName}'

status progresses PENDING → PROVISIONING → ACTIVE — usually under a minute. Other modes (external for your own VBase, pick for an existing service) are documented at API Reference → Engine.

2. Discover a vector template

K3 ships a catalog of *_embedding_index Scriptum templates — text, code, visual, face, object. List the vector-compatible ones:


dodil k3 vector templates -o json | jq '.templates[] | {id, name, modalities, acceptedExtensions}'

For PDF / docx / HTML / plain text, pick text_embedding_index — the canonical RAG template. (Other choices: code_embedding_index for source code, visual_embedding_index for image / video / audio / PDF page renders.)

Inspect the contract to see what inputs it requires:


dodil k3 template get text_embedding_index -o json | jq '.contract.inputs'

3. Create the collection (pipeline-mode)

dodil k3 vector collection add is the pipeline-mode path — schema (dimensions, metric, sparse mode, embed_model) materializes from the template’s ScriptContract:


dodil k3 vector collection add docs --bucket kb-prod \
  --description "PDF / docx / HTML embeddings" \
  --template text_embedding_index \
  -o json

The response is a Collection row — capture its ID for later:


export COLLECTION_ID=$(dodil k3 vector collection get docs --bucket kb-prod -o json | jq -r '.collectionId')

What K3 created atomically:

A Milvus collection (schema lazy-materialized on first ingest)
A Scriptum pipeline bound to text_embedding_index
An auto-generated ingest rule whose globs come from the template’s acceptedExtensions (**/*.pdf, **/*.txt, …)

4. Upload a document

Drop any text/PDF into the bucket — K3’s pipeline rule fires automatically:


curl -sSL https://arxiv.org/pdf/1706.03762.pdf -o attention.pdf
dodil k3 object create ./attention.pdf -b kb-prod -k papers/attention.pdf

5. Watch the ingest job


# Filter by pipeline — find the embedding job for this collection
PIPELINE_ID=$(dodil k3 vector collection get docs --bucket kb-prod -o json | jq -r '.embedPipelineId')
 
dodil k3 ingest jobs --bucket kb-prod --pipeline "$PIPELINE_ID" -o json \
  | jq '.jobs[] | {object: .object.key, status, chunksCreated, embeddingsWritten}'

Status path: PENDING → PROCESSING → COMPLETED. When done you’ll see something like:


{
  "object": "papers/attention.pdf",
  "status": "INGEST_STATUS_COMPLETED",
  "chunksCreated": 47,
  "embeddingsWritten": 47
}

For replay / retry semantics, see Pipelines → Replay & Retry.

6. Search


dodil k3 search "what is multi-head attention" --bucket kb-prod --collection docs -o json \
  | jq '.results[] | {score, object: .object.key, content}'

You should see chunks from papers/attention.pdf ranked by similarity — top-1 likely the chunk explaining the multi-head attention mechanism.

The CLI’s search only takes text queries today. Hybrid mode, rerank, pre-filters, multi-collection, file-by-S3-key, batch vector input — all on the API. See the Search API reference for the full surface.

What you just built

Step	RPC	Entity created
1	`ConfigureEngine`	`Engine` (status ACTIVE)
3	`AddVectorPipeline`	`Collection` + Scriptum pipeline + ingest rule
4 → 5	(object upload + auto-ingest)	`IngestJob`
6	`Search`	`SearchResult[]`

Every subsequent PDF you upload follows the same chain — chunks land in docs, searchable immediately.

Cleanup


# Delete the collection (Milvus collection + bound pipeline + auto rule)
dodil k3 vector collection delete "$COLLECTION_ID" --bucket kb-prod
 
# Delete the engine (optional — keeps if you'll create more collections)
dodil k3 vector store delete -b kb-prod
 
# Delete the object
dodil k3 object remove papers/attention.pdf -b kb-prod

vector store delete does not drop the underlying VBase database by default (auto-mode only). To also drop the DB: use the API with delete_database: true — see Engine API.

Next steps

Core Concepts — every type signature, sparse modes, embedding types, distance metrics
API Reference → Search — the heavy page; multi-collection, hybrid + rerank, pre-filters, multimodal, batch vector queries, Milvus-native tuning
Recipes — pipeline collection / external collection / multi-collection search / hybrid + rerank / multimodal search
VBase — for direct Milvus access (advanced)