RAG Knowledge Base

Goal: stand up a working RAG (retrieval-augmented generation) corpus on K3. Upload PDFs / docs / text → K3 chunks + embeds + indexes them automatically → query with hybrid search + rerank.

Primitives used: Storage (the bucket + S3 upload) → Pipelines (the auto-generated rule wired by Vector at collection-create time) → Vector (the collection + search).

Shape:


   ┌──────────┐
   │  Your    │
   │ documents│
   └────┬─────┘
        │  aws s3 cp / dodil k3 object create
        ▼
   ┌──────────────────────────────────────────────────┐
   │ Storage — kb-platform bucket                      │
   └──────────────────────────────────────────────────┘
        │  auto-rule fires (globs from template's acceptedExtensions)
        ▼
   ┌──────────────────────────────────────────────────┐
   │ Pipelines — text_embedding_index Scriptum         │
   │ runs per uploaded object → chunks + embeds        │
   └──────────────────────────────────────────────────┘
        │
        ▼
   ┌──────────────────────────────────────────────────┐
   │ Vector — `docs` collection                        │
   │ (Milvus, pipeline-mode, BM25 enabled)             │
   └──────────────────────────────────────────────────┘
        │
        ▼
   ┌──────────────────────────────────────────────────┐
   │ App layer: Search RPC → top-K chunks → LLM        │
   └──────────────────────────────────────────────────┘

Prerequisites

dodil CLI installed + dodil login — see CLI Basics
(Optional) aws-cli configured against K3 for S3-style uploads — see Storage → S3 Compatibility

1. Create the bucket + configure the vector engine


# Storage primitive — create the bucket
dodil k3 bucket create kb-platform -d "Production RAG knowledge base"
 
# Vector primitive — configure engine (auto mode = K3 provisions VBase)
dodil k3 vector store create -b kb-platform -m auto
 
# Wait for engine ACTIVE (typically < 60 s)
until dodil k3 vector store get -b kb-platform -o json | jq -e '.status == "ENGINE_STATUS_ACTIVE"' > /dev/null; do
  echo "  engine status: $(dodil k3 vector store get -b kb-platform -o json | jq -r .status) — waiting..."
  sleep 5
done
echo "✅ engine ACTIVE"

Tables engine is auto-enabled on every bucket too — dodil k3 bucket create wires both Storage entitlements and the Tables engine. You don’t need to enable it explicitly. Only the Vector engine needs an explicit vector store create.

2. Pick a template + create the collection


# Browse vector-pillar templates
dodil k3 vector templates -o json | jq '.templates[] | {id, modalities, acceptedExtensions}'

For PDF + docx + HTML + plain text, pick text_embedding_index. Inspect its contract to confirm no required runtime inputs:


dodil k3 template get text_embedding_index -o json | jq '.contract.inputs'

Create a pipeline-mode collection — K3 atomically creates the collection + a Scriptum pipeline + an auto-generated ingest rule:


dodil k3 vector collection add docs -b kb-platform \
  --description "Production RAG corpus" \
  --template text_embedding_index
 
export COLLECTION_ID=$(dodil k3 vector collection get docs -b kb-platform -o json | jq -r '.collectionId')
export PIPELINE_ID=$(dodil k3 vector collection get docs -b kb-platform -o json | jq -r '.embedPipelineId')

Confirm the auto-rule is enabled:


dodil k3 ingest list -b kb-platform -p "$PIPELINE_ID" -o json \
  | jq '.rules[] | {ruleId, name, includePatterns, enabled}'

Expect something like includePatterns: ["**/*.pdf", "**/*.txt", "**/*.docx", "**/*.html"] and enabled: true.

3. Upload documents — three ways

Via CLI (one-off):


curl -sSL https://arxiv.org/pdf/1706.03762.pdf -o attention.pdf
dodil k3 object create ./attention.pdf -b kb-platform -k papers/attention.pdf

Via aws-cli (bulk, S3-style — works because K3 speaks native S3):


# One-time setup
aws configure --profile dodil-k3
# AWS Access Key ID: <your Dodil service-account ID>
# AWS Secret Access Key: <your Dodil service-account secret>
# Default region name: us-east-1
# Default output format: json
 
# Bulk-upload an existing folder
aws s3 sync ./my-docs/ s3://kb-platform/papers/ \
  --endpoint-url https://k3.dev.dodil.io \
  --profile dodil-k3

Via boto3 / @aws-sdk/client-s3 — same S3 SDKs, same auth. See Storage → S3 Compatibility for setup snippets.

Every upload (CLI or S3 SDK) fires through the auto-generated rule → spawns an ingest job → runs text_embedding_index → writes chunks + embeddings to the docs collection.

4. Watch the ingest pipeline


dodil k3 ingest jobs -b kb-platform -p "$PIPELINE_ID" -o json \
  | jq '.jobs[] | {object: .object.key, status, chunksCreated, embeddingsWritten}'

Status path: PENDING → PROCESSING → COMPLETED. Happy path: chunksCreated == embeddingsWritten. If embeddingsWritten is lower, see Pipelines → Replay & Retry.

5. Search — three variants of increasing sophistication

A. Quick text search (CLI)


dodil k3 search "what is multi-head attention" -b kb-platform -c docs -o json \
  | jq '.results[] | {score, object: .object.key}'

Default freshness (eventual), default mode (VECTOR — dense only). Good enough for a smoke test.

B. Hybrid + rerank (via API — CLI gap)

For production RAG, you want hybrid (dense + BM25) + Jina rerank. Drop to the API:


curl -sS -X POST "https://k3.dev.dodil.io/kb-platform/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-platform",
    "collectionName": "docs",
    "text": "what is multi-head attention",
    "topK": 5,
    "searchMode": "SEARCH_MODE_AUTO",
    "rerank": true,
    "includeContent": true
  }' | jq '{
       searchModeUsed,
       tookMs,
       results: [.results[] | {
         score,
         object: .object.key,
         content: (.content | .[0:200] + "...")
       }]
     }'

SEARCH_MODE_AUTO picks hybrid because text_embedding_index enables BM25 by default. rerank: true adds Jina cross-encoder scoring over the top-K. includeContent: true returns the chunk text — what your LLM consumes downstream. See Vector → Hybrid + Rerank for the full benchmark + tier breakdown.

C. Filtered search — only papers, only post-2017


curl -sS -X POST "https://k3.dev.dodil.io/kb-platform/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-platform",
    "collectionName": "docs",
    "text": "attention mechanism",
    "topK": 5,
    "searchMode": "SEARCH_MODE_AUTO",
    "rerank": true,
    "includeContent": true,
    "preFilter": {
      "op": "LOGICAL_OP_AND",
      "filters": [
        { "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" }
      ]
    }
  }'

For the full pre-filter operator table, see Vector → Search → Pre-filter.

6. Wire into your application

A typical RAG loop in Python:


import requests, openai
 
K3 = "https://k3.dev.dodil.io"
HEADERS = {
    "Authorization": f"Bearer {os.environ['DODIL_TOKEN']}",
    "Content-Type": "application/json",
}
 
def rag_query(question: str) -> str:
    # 1. Retrieve top-5 chunks from K3 with hybrid + rerank
    r = requests.post(
        f"{K3}/kb-platform/vector/search",
        headers=HEADERS,
        json={
            "bucket": "kb-platform",
            "collectionName": "docs",
            "text": question,
            "topK": 5,
            "searchMode": "SEARCH_MODE_AUTO",
            "rerank": True,
            "includeContent": True,
        },
    ).json()
 
    # 2. Build the context
    chunks = [r["content"] for r in r["results"]]
    context = "\n\n---\n\n".join(chunks)
 
    # 3. Hand to your LLM
    completion = openai.OpenAI().chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer the question using the provided context. Cite sources by file name."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
        ],
    )
    return completion.choices[0].message.content
 
print(rag_query("Explain self-attention"))

Drop-in Node / Go / Rust equivalents — the K3 HTTP API speaks pbjson; any HTTP client works.

7. Operational maintenance

Add new documents

Same upload commands — every new object hits the auto-generated rule → ingest job → vectors land in docs. No re-configuration needed.

Backfill after rule changes

Edit the rule to broaden coverage (e.g. add .md to includePatterns), then retroactively re-ingest objects that now match:


RULE_ID=$(dodil k3 ingest list -b kb-platform -p "$PIPELINE_ID" -o json | jq -r '.rules[0].ruleId')
 
# Add .md to the include patterns
dodil k3 ingest update "$RULE_ID" -b kb-platform \
  --include "**/*.pdf" --include "**/*.docx" --include "**/*.txt" --include "**/*.md"
 
# Re-discover the source (internal-S3) and dispatch ingestion for matched objects
SOURCE_ID=$(curl -sS "https://k3.dev.dodil.io/kb-platform/sources" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  | jq -r '.sources[] | select(.name == "internal") | .sourceId')
 
dodil k3 ingest trigger-discovery -b kb-platform -s "$SOURCE_ID" --full-sync

For replay of failed jobs specifically, see Pipelines → Replay & Retry.

Pause ingestion temporarily


# Disable the rule — uploads still happen, just no ingestion
dodil k3 ingest update "$RULE_ID" -b kb-platform --enabled=false
 
# Re-enable when ready
dodil k3 ingest update "$RULE_ID" -b kb-platform --enabled=true

Inspect what’s indexed


# Count rows in the collection
dodil k3 vector collection get docs -b kb-platform -o json \
  | jq '{name, status, dimensions, embedModel, sparseMode}'
 
# Per-object-key chunk status — via Storage's ObjectInfo
dodil k3 object show papers/attention.pdf -b kb-platform -o json \
  | jq '.pipelineStatuses[]'   # one entry per rule that ran on this object

Common gotchas

Symptom	Cause	Fix
`vector collection add` fails with “engine not active”	Engine still provisioning	Wait for `ENGINE_STATUS_ACTIVE` (poll `vector store get`)
Upload succeeds but no ingest job spawns	Object path doesn’t match auto-rule globs	List the rule’s `includePatterns`; remember `*/.pdf` is recursive, `*.pdf` is not
Jobs `COMPLETED` but `search` returns nothing	First-ingest Milvus index build still in progress	Wait 10–30 s after first ingest, then re-search
Search returns stale results after re-upload	Vector index updates async after re-ingest	Either delete + re-upload as a new key, or query with `freshness` (vector doesn’t have a freshness selector like Tables; rely on re-ingest semantics)
Different doc types yielding very different chunk counts	`text_embedding_index` chunks by token count; long docs → many chunks	Adjust `chunk_size` / `chunk_overlap` per-pipeline via `dodil k3 pipeline update`
Latency spikes after corpus grows past ~100K chunks	HNSW `ef` default is `top_k * 2` — too low for large corpora	Use pre-embedded vector queries with `vector.searchParams.ef = "256"` — see Vector → Search → Worked example: tuned HNSW

Cleanup


# Pause ingestion first
dodil k3 ingest update "$RULE_ID" -b kb-platform --enabled=false
 
# Delete in this order (no cascade)
dodil k3 ingest delete "$RULE_ID" -b kb-platform
dodil k3 pipeline delete "$PIPELINE_ID" -b kb-platform
dodil k3 vector collection delete "$COLLECTION_ID" -b kb-platform
dodil k3 vector store delete -b kb-platform
 
# (Optional) Drop the bucket + objects
dodil k3 bucket delete kb-platform