Skip to Content
We are live but in Staging 🎉
VectorQuickstart

Quickstart

Five minutes from here you’ll have a vector collection with one indexed document and a working search that returns ranked chunks.

We’ll go the pipeline-collection route — easiest, no embedding setup on your side. K3 ships text/code/visual embedding templates from Scriptum; pick one and K3 chunks + embeds + indexes for you. (For BYO-embeddings, see Recipes → External Collection.)

Prerequisites

  • dodil CLI installed and dodil login done — CLI Basics
  • A bucket — kb-prod:
    dodil k3 bucket create kb-prod -d "RAG corpus"

1. Configure the vector engine

Every bucket needs its vector engine configured once before you can create collections. The simplest path is auto mode — K3 provisions a VBase database on a managed cluster for you:

dodil k3 vector store create -b kb-prod -m auto

Verify it landed:

dodil k3 vector store get -b kb-prod -o json | jq '{status, mode, vbaseEndpoint, vbaseDbName}'

status progresses PENDING → PROVISIONING → ACTIVE — usually under a minute. Other modes (external for your own VBase, pick for an existing service) are documented at API Reference → Engine.

2. Discover a vector template

K3 ships a catalog of *_embedding_index Scriptum templates — text, code, visual, face, object. List the vector-compatible ones:

dodil k3 vector templates -o json | jq '.templates[] | {id, name, modalities, acceptedExtensions}'

For PDF / docx / HTML / plain text, pick text_embedding_index — the canonical RAG template. (Other choices: code_embedding_index for source code, visual_embedding_index for image / video / audio / PDF page renders.)

Inspect the contract to see what inputs it requires:

dodil k3 template get text_embedding_index -o json | jq '.contract.inputs'

3. Create the collection (pipeline-mode)

dodil k3 vector collection add is the pipeline-mode path — schema (dimensions, metric, sparse mode, embed_model) materializes from the template’s ScriptContract:

dodil k3 vector collection add docs --bucket kb-prod \ --description "PDF / docx / HTML embeddings" \ --template text_embedding_index \ -o json

The response is a Collection row — capture its ID for later:

export COLLECTION_ID=$(dodil k3 vector collection get docs --bucket kb-prod -o json | jq -r '.collectionId')

What K3 created atomically:

  • A Milvus collection (schema lazy-materialized on first ingest)
  • A Scriptum pipeline bound to text_embedding_index
  • An auto-generated ingest rule whose globs come from the template’s acceptedExtensions (**/*.pdf, **/*.txt, …)

4. Upload a document

Drop any text/PDF into the bucket — K3’s pipeline rule fires automatically:

curl -sSL https://arxiv.org/pdf/1706.03762.pdf -o attention.pdf dodil k3 object create ./attention.pdf -b kb-prod -k papers/attention.pdf

5. Watch the ingest job

# Filter by pipeline — find the embedding job for this collection PIPELINE_ID=$(dodil k3 vector collection get docs --bucket kb-prod -o json | jq -r '.embedPipelineId') dodil k3 ingest jobs --bucket kb-prod --pipeline "$PIPELINE_ID" -o json \ | jq '.jobs[] | {object: .object.key, status, chunksCreated, embeddingsWritten}'

Status path: PENDING → PROCESSING → COMPLETED. When done you’ll see something like:

{ "object": "papers/attention.pdf", "status": "INGEST_STATUS_COMPLETED", "chunksCreated": 47, "embeddingsWritten": 47 }

For replay / retry semantics, see Pipelines → Replay & Retry.

dodil k3 search "what is multi-head attention" --bucket kb-prod --collection docs -o json \ | jq '.results[] | {score, object: .object.key, content}'

You should see chunks from papers/attention.pdf ranked by similarity — top-1 likely the chunk explaining the multi-head attention mechanism.

The CLI’s search only takes text queries today. Hybrid mode, rerank, pre-filters, multi-collection, file-by-S3-key, batch vector input — all on the API. See the Search API reference for the full surface.

What you just built

StepRPCEntity created
1ConfigureEngineEngine (status ACTIVE)
3AddVectorPipelineCollection + Scriptum pipeline + ingest rule
4 → 5(object upload + auto-ingest)IngestJob
6SearchSearchResult[]

Every subsequent PDF you upload follows the same chain — chunks land in docs, searchable immediately.

Cleanup

# Delete the collection (Milvus collection + bound pipeline + auto rule) dodil k3 vector collection delete "$COLLECTION_ID" --bucket kb-prod # Delete the engine (optional — keeps if you'll create more collections) dodil k3 vector store delete -b kb-prod # Delete the object dodil k3 object remove papers/attention.pdf -b kb-prod

vector store delete does not drop the underlying VBase database by default (auto-mode only). To also drop the DB: use the API with delete_database: true — see Engine API.

Next steps

  • Core Concepts — every type signature, sparse modes, embedding types, distance metrics
  • API Reference → Search — the heavy page; multi-collection, hybrid + rerank, pre-filters, multimodal, batch vector queries, Milvus-native tuning
  • Recipes — pipeline collection / external collection / multi-collection search / hybrid + rerank / multimodal search
  • VBase  — for direct Milvus access (advanced)