Quickstart
Five minutes from here you’ll have a vector collection with one indexed document and a working search that returns ranked chunks.
We’ll go the pipeline-collection route — easiest, no embedding setup on your side. K3 ships text/code/visual embedding templates from Scriptum; pick one and K3 chunks + embeds + indexes for you. (For BYO-embeddings, see Recipes → External Collection.)
Prerequisites
dodilCLI installed anddodil logindone — CLI Basics- A bucket —
kb-prod:dodil k3 bucket create kb-prod -d "RAG corpus"
1. Configure the vector engine
Every bucket needs its vector engine configured once before you can create collections. The simplest path is auto mode — K3 provisions a VBase database on a managed cluster for you:
dodil k3 vector store create -b kb-prod -m autoVerify it landed:
dodil k3 vector store get -b kb-prod -o json | jq '{status, mode, vbaseEndpoint, vbaseDbName}'status progresses PENDING → PROVISIONING → ACTIVE — usually under a minute. Other modes (external for your own VBase, pick for an existing service) are documented at API Reference → Engine.
2. Discover a vector template
K3 ships a catalog of *_embedding_index Scriptum templates — text, code, visual, face, object. List the vector-compatible ones:
dodil k3 vector templates -o json | jq '.templates[] | {id, name, modalities, acceptedExtensions}'For PDF / docx / HTML / plain text, pick text_embedding_index — the canonical RAG template. (Other choices: code_embedding_index for source code, visual_embedding_index for image / video / audio / PDF page renders.)
Inspect the contract to see what inputs it requires:
dodil k3 template get text_embedding_index -o json | jq '.contract.inputs'3. Create the collection (pipeline-mode)
dodil k3 vector collection add is the pipeline-mode path — schema (dimensions, metric, sparse mode, embed_model) materializes from the template’s ScriptContract:
dodil k3 vector collection add docs --bucket kb-prod \
--description "PDF / docx / HTML embeddings" \
--template text_embedding_index \
-o jsonThe response is a Collection row — capture its ID for later:
export COLLECTION_ID=$(dodil k3 vector collection get docs --bucket kb-prod -o json | jq -r '.collectionId')What K3 created atomically:
- A Milvus collection (schema lazy-materialized on first ingest)
- A Scriptum pipeline bound to
text_embedding_index - An auto-generated ingest rule whose globs come from the template’s
acceptedExtensions(**/*.pdf,**/*.txt, …)
4. Upload a document
Drop any text/PDF into the bucket — K3’s pipeline rule fires automatically:
curl -sSL https://arxiv.org/pdf/1706.03762.pdf -o attention.pdf
dodil k3 object create ./attention.pdf -b kb-prod -k papers/attention.pdf5. Watch the ingest job
# Filter by pipeline — find the embedding job for this collection
PIPELINE_ID=$(dodil k3 vector collection get docs --bucket kb-prod -o json | jq -r '.embedPipelineId')
dodil k3 ingest jobs --bucket kb-prod --pipeline "$PIPELINE_ID" -o json \
| jq '.jobs[] | {object: .object.key, status, chunksCreated, embeddingsWritten}'Status path: PENDING → PROCESSING → COMPLETED. When done you’ll see something like:
{
"object": "papers/attention.pdf",
"status": "INGEST_STATUS_COMPLETED",
"chunksCreated": 47,
"embeddingsWritten": 47
}For replay / retry semantics, see Pipelines → Replay & Retry.
6. Search
dodil k3 search "what is multi-head attention" --bucket kb-prod --collection docs -o json \
| jq '.results[] | {score, object: .object.key, content}'You should see chunks from papers/attention.pdf ranked by similarity — top-1 likely the chunk explaining the multi-head attention mechanism.
The CLI’s
searchonly takes text queries today. Hybrid mode, rerank, pre-filters, multi-collection, file-by-S3-key, batch vector input — all on the API. See the Search API reference for the full surface.
What you just built
| Step | RPC | Entity created |
|---|---|---|
| 1 | ConfigureEngine | Engine (status ACTIVE) |
| 3 | AddVectorPipeline | Collection + Scriptum pipeline + ingest rule |
| 4 → 5 | (object upload + auto-ingest) | IngestJob |
| 6 | Search | SearchResult[] |
Every subsequent PDF you upload follows the same chain — chunks land in docs, searchable immediately.
Cleanup
# Delete the collection (Milvus collection + bound pipeline + auto rule)
dodil k3 vector collection delete "$COLLECTION_ID" --bucket kb-prod
# Delete the engine (optional — keeps if you'll create more collections)
dodil k3 vector store delete -b kb-prod
# Delete the object
dodil k3 object remove papers/attention.pdf -b kb-prod
vector store deletedoes not drop the underlying VBase database by default (auto-mode only). To also drop the DB: use the API withdelete_database: true— see Engine API.
Next steps
- Core Concepts — every type signature, sparse modes, embedding types, distance metrics
- API Reference → Search — the heavy page; multi-collection, hybrid + rerank, pre-filters, multimodal, batch vector queries, Milvus-native tuning
- Recipes — pipeline collection / external collection / multi-collection search / hybrid + rerank / multimodal search
- VBase — for direct Milvus access (advanced)