RAG Knowledge Base
Goal: stand up a working RAG (retrieval-augmented generation) corpus on K3. Upload PDFs / docs / text → K3 chunks + embeds + indexes them automatically → query with hybrid search + rerank.
Primitives used: Storage (the bucket + S3 upload) → Pipelines (the auto-generated rule wired by Vector at collection-create time) → Vector (the collection + search).
Shape:
┌──────────┐
│ Your │
│ documents│
└────┬─────┘
│ aws s3 cp / dodil k3 object create
▼
┌──────────────────────────────────────────────────┐
│ Storage — kb-platform bucket │
└──────────────────────────────────────────────────┘
│ auto-rule fires (globs from template's acceptedExtensions)
▼
┌──────────────────────────────────────────────────┐
│ Pipelines — text_embedding_index Scriptum │
│ runs per uploaded object → chunks + embeds │
└──────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Vector — `docs` collection │
│ (Milvus, pipeline-mode, BM25 enabled) │
└──────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ App layer: Search RPC → top-K chunks → LLM │
└──────────────────────────────────────────────────┘Prerequisites
dodilCLI installed +dodil login— see CLI Basics- (Optional)
aws-cliconfigured against K3 for S3-style uploads — see Storage → S3 Compatibility
1. Create the bucket + configure the vector engine
# Storage primitive — create the bucket
dodil k3 bucket create kb-platform -d "Production RAG knowledge base"
# Vector primitive — configure engine (auto mode = K3 provisions VBase)
dodil k3 vector store create -b kb-platform -m auto
# Wait for engine ACTIVE (typically < 60 s)
until dodil k3 vector store get -b kb-platform -o json | jq -e '.status == "ENGINE_STATUS_ACTIVE"' > /dev/null; do
echo " engine status: $(dodil k3 vector store get -b kb-platform -o json | jq -r .status) — waiting..."
sleep 5
done
echo "✅ engine ACTIVE"Tables engine is auto-enabled on every bucket too —
dodil k3 bucket createwires both Storage entitlements and the Tables engine. You don’t need to enable it explicitly. Only the Vector engine needs an explicitvector store create.
2. Pick a template + create the collection
# Browse vector-pillar templates
dodil k3 vector templates -o json | jq '.templates[] | {id, modalities, acceptedExtensions}'For PDF + docx + HTML + plain text, pick text_embedding_index. Inspect its contract to confirm no required runtime inputs:
dodil k3 template get text_embedding_index -o json | jq '.contract.inputs'Create a pipeline-mode collection — K3 atomically creates the collection + a Scriptum pipeline + an auto-generated ingest rule:
dodil k3 vector collection add docs -b kb-platform \
--description "Production RAG corpus" \
--template text_embedding_index
export COLLECTION_ID=$(dodil k3 vector collection get docs -b kb-platform -o json | jq -r '.collectionId')
export PIPELINE_ID=$(dodil k3 vector collection get docs -b kb-platform -o json | jq -r '.embedPipelineId')Confirm the auto-rule is enabled:
dodil k3 ingest list -b kb-platform -p "$PIPELINE_ID" -o json \
| jq '.rules[] | {ruleId, name, includePatterns, enabled}'Expect something like includePatterns: ["**/*.pdf", "**/*.txt", "**/*.docx", "**/*.html"] and enabled: true.
3. Upload documents — three ways
Via CLI (one-off):
curl -sSL https://arxiv.org/pdf/1706.03762.pdf -o attention.pdf
dodil k3 object create ./attention.pdf -b kb-platform -k papers/attention.pdfVia aws-cli (bulk, S3-style — works because K3 speaks native S3):
# One-time setup
aws configure --profile dodil-k3
# AWS Access Key ID: <your Dodil service-account ID>
# AWS Secret Access Key: <your Dodil service-account secret>
# Default region name: us-east-1
# Default output format: json
# Bulk-upload an existing folder
aws s3 sync ./my-docs/ s3://kb-platform/papers/ \
--endpoint-url https://k3.dev.dodil.io \
--profile dodil-k3Via boto3 / @aws-sdk/client-s3 — same S3 SDKs, same auth. See Storage → S3 Compatibility for setup snippets.
Every upload (CLI or S3 SDK) fires through the auto-generated rule → spawns an ingest job → runs text_embedding_index → writes chunks + embeddings to the docs collection.
4. Watch the ingest pipeline
dodil k3 ingest jobs -b kb-platform -p "$PIPELINE_ID" -o json \
| jq '.jobs[] | {object: .object.key, status, chunksCreated, embeddingsWritten}'Status path: PENDING → PROCESSING → COMPLETED. Happy path: chunksCreated == embeddingsWritten. If embeddingsWritten is lower, see Pipelines → Replay & Retry.
5. Search — three variants of increasing sophistication
A. Quick text search (CLI)
dodil k3 search "what is multi-head attention" -b kb-platform -c docs -o json \
| jq '.results[] | {score, object: .object.key}'Default freshness (eventual), default mode (VECTOR — dense only). Good enough for a smoke test.
B. Hybrid + rerank (via API — CLI gap)
For production RAG, you want hybrid (dense + BM25) + Jina rerank. Drop to the API:
curl -sS -X POST "https://k3.dev.dodil.io/kb-platform/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-platform",
"collectionName": "docs",
"text": "what is multi-head attention",
"topK": 5,
"searchMode": "SEARCH_MODE_AUTO",
"rerank": true,
"includeContent": true
}' | jq '{
searchModeUsed,
tookMs,
results: [.results[] | {
score,
object: .object.key,
content: (.content | .[0:200] + "...")
}]
}'SEARCH_MODE_AUTO picks hybrid because text_embedding_index enables BM25 by default. rerank: true adds Jina cross-encoder scoring over the top-K. includeContent: true returns the chunk text — what your LLM consumes downstream. See Vector → Hybrid + Rerank for the full benchmark + tier breakdown.
C. Filtered search — only papers, only post-2017
curl -sS -X POST "https://k3.dev.dodil.io/kb-platform/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-platform",
"collectionName": "docs",
"text": "attention mechanism",
"topK": 5,
"searchMode": "SEARCH_MODE_AUTO",
"rerank": true,
"includeContent": true,
"preFilter": {
"op": "LOGICAL_OP_AND",
"filters": [
{ "field": "source_key", "op": "FILTER_OP_CONTAINS", "value": "papers/" }
]
}
}'For the full pre-filter operator table, see Vector → Search → Pre-filter.
6. Wire into your application
A typical RAG loop in Python:
import requests, openai
K3 = "https://k3.dev.dodil.io"
HEADERS = {
"Authorization": f"Bearer {os.environ['DODIL_TOKEN']}",
"Content-Type": "application/json",
}
def rag_query(question: str) -> str:
# 1. Retrieve top-5 chunks from K3 with hybrid + rerank
r = requests.post(
f"{K3}/kb-platform/vector/search",
headers=HEADERS,
json={
"bucket": "kb-platform",
"collectionName": "docs",
"text": question,
"topK": 5,
"searchMode": "SEARCH_MODE_AUTO",
"rerank": True,
"includeContent": True,
},
).json()
# 2. Build the context
chunks = [r["content"] for r in r["results"]]
context = "\n\n---\n\n".join(chunks)
# 3. Hand to your LLM
completion = openai.OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Answer the question using the provided context. Cite sources by file name."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
],
)
return completion.choices[0].message.content
print(rag_query("Explain self-attention"))Drop-in Node / Go / Rust equivalents — the K3 HTTP API speaks pbjson; any HTTP client works.
7. Operational maintenance
Add new documents
Same upload commands — every new object hits the auto-generated rule → ingest job → vectors land in docs. No re-configuration needed.
Backfill after rule changes
Edit the rule to broaden coverage (e.g. add .md to includePatterns), then retroactively re-ingest objects that now match:
RULE_ID=$(dodil k3 ingest list -b kb-platform -p "$PIPELINE_ID" -o json | jq -r '.rules[0].ruleId')
# Add .md to the include patterns
dodil k3 ingest update "$RULE_ID" -b kb-platform \
--include "**/*.pdf" --include "**/*.docx" --include "**/*.txt" --include "**/*.md"
# Re-discover the source (internal-S3) and dispatch ingestion for matched objects
SOURCE_ID=$(curl -sS "https://k3.dev.dodil.io/kb-platform/sources" \
-H "Authorization: Bearer $DODIL_TOKEN" \
| jq -r '.sources[] | select(.name == "internal") | .sourceId')
dodil k3 ingest trigger-discovery -b kb-platform -s "$SOURCE_ID" --full-syncFor replay of failed jobs specifically, see Pipelines → Replay & Retry.
Pause ingestion temporarily
# Disable the rule — uploads still happen, just no ingestion
dodil k3 ingest update "$RULE_ID" -b kb-platform --enabled=false
# Re-enable when ready
dodil k3 ingest update "$RULE_ID" -b kb-platform --enabled=trueInspect what’s indexed
# Count rows in the collection
dodil k3 vector collection get docs -b kb-platform -o json \
| jq '{name, status, dimensions, embedModel, sparseMode}'
# Per-object-key chunk status — via Storage's ObjectInfo
dodil k3 object show papers/attention.pdf -b kb-platform -o json \
| jq '.pipelineStatuses[]' # one entry per rule that ran on this objectCommon gotchas
| Symptom | Cause | Fix |
|---|---|---|
vector collection add fails with “engine not active” | Engine still provisioning | Wait for ENGINE_STATUS_ACTIVE (poll vector store get) |
| Upload succeeds but no ingest job spawns | Object path doesn’t match auto-rule globs | List the rule’s includePatterns; remember **/*.pdf is recursive, *.pdf is not |
Jobs COMPLETED but search returns nothing | First-ingest Milvus index build still in progress | Wait 10–30 s after first ingest, then re-search |
| Search returns stale results after re-upload | Vector index updates async after re-ingest | Either delete + re-upload as a new key, or query with freshness (vector doesn’t have a freshness selector like Tables; rely on re-ingest semantics) |
| Different doc types yielding very different chunk counts | text_embedding_index chunks by token count; long docs → many chunks | Adjust chunk_size / chunk_overlap per-pipeline via dodil k3 pipeline update |
| Latency spikes after corpus grows past ~100K chunks | HNSW ef default is top_k * 2 — too low for large corpora | Use pre-embedded vector queries with vector.searchParams.ef = "256" — see Vector → Search → Worked example: tuned HNSW |
Cleanup
# Pause ingestion first
dodil k3 ingest update "$RULE_ID" -b kb-platform --enabled=false
# Delete in this order (no cascade)
dodil k3 ingest delete "$RULE_ID" -b kb-platform
dodil k3 pipeline delete "$PIPELINE_ID" -b kb-platform
dodil k3 vector collection delete "$COLLECTION_ID" -b kb-platform
dodil k3 vector store delete -b kb-platform
# (Optional) Drop the bucket + objects
dodil k3 bucket delete kb-platformSee also
- Storage → S3 Compatibility — bulk-upload via aws-cli / boto3 / @aws-sdk/client-s3
- Pipelines → Replay & Retry — recover from failed ingests
- Vector → Pipeline Collection — deeper on the Vector primitive alone
- Vector → Hybrid + Rerank — when each retrieval tier helps; latency cost
- Vector → Search — the full Search RPC reference
- Document Intake — same upload path, but ALSO routes through a triage table — when you need both semantic recall AND structured decisions per doc