Reviews Dashboard + Semantic Search
Goal: every uploaded customer review gets analyzed AND indexed in parallel — sentiment / topics / toxicity land in a warehouse table (for SQL dashboards), while semantic embeddings land in a vector collection (for “find reviews similar to this complaint”). One upload, two destinations, zero glue code.
Primitives used: Storage + Pipelines (two pipelines, same source) + Tables + Vector. The fan-out happens because K3 lets you have multiple ingest rules on a single source, each routing to its own pipeline + destination.
Shape:
review.json ──upload──► Storage bucket
│
▼ internal-S3 source fires two matching rules
┌───────┴───────┐
│ │
▼ ▼
Pipeline A Pipeline B
(review_analysis) (text_embedding_index)
│ │
▼ ▼
Tables: reviews Vector: reviews_vec
(sentiment, topics, (semantic embeddings
keywords, toxicity) for recall)
│ │
▼ ▼
SQL queries / BI Search RPC / RAG
dashboards / "similar to X"Prerequisites
dodilCLI +dodil login- A bucket —
kb-reviews:dodil k3 bucket create kb-reviews -d "Customer review intake" - Tables engine is auto-enabled at bucket creation.
- Vector engine needs explicit config:
dodil k3 vector store create -b kb-reviews -m auto # Wait for ACTIVE
1. Create the two destinations
A. Pipeline-bound table for sentiment
Pick the review_analysis template — sentiment + toxicity + keyword extraction, emits structured rows:
# Browse — confirm review_analysis is warehouse-compatible
dodil k3 table templates --search review -o json | jq '.templates[] | {id, modalities}'
# Create the table with pipeline binding
dodil k3 table create reviews -b kb-reviews \
--description "Customer reviews — sentiment analyzed" \
--source pipeline_generated \
--pipeline-template-id review_analysisK3 atomically creates: the table row + a Scriptum pipeline + an auto-generated ingest rule. Schema lazy-materializes on first ingest.
B. Pipeline-mode vector collection for semantic recall
dodil k3 vector collection add reviews_vec -b kb-reviews \
--description "Customer reviews — semantic embeddings" \
--template text_embedding_indexSame pattern: collection + pipeline + auto-rule. Now you have two pipelines in kb-reviews, both pointing at the bucket’s internal-S3 source. Both will fire on every matching review upload.
Capture identifiers for later operational steps:
export TABLE_PIPELINE_ID=$(dodil k3 table get reviews -b kb-reviews -o json | jq -r '.pipelineId')
export VECTOR_PIPELINE_ID=$(dodil k3 vector collection get reviews_vec -b kb-reviews -o json | jq -r '.embedPipelineId')
# Confirm both rules are enabled and have matching globs
dodil k3 ingest list -b kb-reviews -o json \
| jq '.rules[] | {ruleId, name, pipelineId, includePatterns, enabled}'You should see two rules, both enabled: true, with overlapping include patterns (both will match **/*.json, **/*.txt, etc.). Each rule points at its own pipeline.
Why two rules instead of one shared rule? Each pipeline owns its destination, and rules carry
pipeline_id. K3 dispatches one rule → one pipeline per match. To fan out to N pipelines, you need N rules. K3 generates them for you when you create pipeline-bound destinations via the high-level RPCs (CreateTablePipelineandAddVectorPipeline).
2. Upload a review
Stage some sample review JSON:
cat > review-001.json <<'EOF'
{
"id": "REV-001",
"customer_id": "cust-91872",
"product_sku": "BAG-LEATHER-BROWN-01",
"rating": 2,
"title": "Hardware tarnished after one week",
"body": "Bought this thinking it'd last. Within a week the gold-tone hardware started oxidizing. Customer service was slow and unhelpful. Disappointed for the price."
}
EOF
dodil k3 object create ./review-001.json -b kb-reviews -k reviews/2026/05/REV-001.jsonSingle upload, two pipelines fire in parallel.
3. Watch both ingest jobs
# Job from the Tables pipeline
dodil k3 ingest jobs -b kb-reviews -p "$TABLE_PIPELINE_ID" -o json \
| jq '.jobs[] | {pipeline: "table", object: .object.key, status, rowsWritten}'
# Job from the Vector pipeline
dodil k3 ingest jobs -b kb-reviews -p "$VECTOR_PIPELINE_ID" -o json \
| jq '.jobs[] | {pipeline: "vector", object: .object.key, status, chunksCreated, embeddingsWritten}'You should see one job from each pipeline for the same object key. The Tables job’s rows_written counts how many structured rows landed in reviews; the Vector job’s chunks_created / embeddings_written count what landed in reviews_vec.
Cross-check via the object’s pipelineStatuses:
dodil k3 object show reviews/2026/05/REV-001.json -b kb-reviews -o json \
| jq '.pipelineStatuses[]'Two entries, one per rule that fired.
4. Query the dashboard table
Once the Tables drain completes, the structured rows are queryable. First, describe the table to see the resolved schema (template-driven, lazy):
dodil k3 table describe reviews -b kb-reviews -o json \
| jq '{
tableName, version,
columns: [.columns[] | {name, type}],
rowCount
}'For review_analysis you’ll typically see columns like source_key, review_id, sentiment_score, sentiment_label, intent, topics, toxicity_score, keywords, extracted_at. Exact column names depend on the template’s ScriptContract.
Now run SQL — the kind of queries that drive a BI dashboard:
# Overall sentiment distribution
dodil k3 table query "
SELECT sentiment_label, COUNT(*) AS n
FROM reviews
GROUP BY sentiment_label
ORDER BY n DESC
" -b kb-reviews
# Top complaint topics in the last week
dodil k3 table query "
SELECT topic, COUNT(*) AS n
FROM (SELECT UNNEST(topics) AS topic FROM reviews WHERE sentiment_label = 'negative')
GROUP BY topic
ORDER BY n DESC
LIMIT 10
" -b kb-reviews
# Toxicity outliers — needs CX team review
dodil k3 table query "
SELECT source_key, sentiment_score, toxicity_score
FROM reviews
WHERE toxicity_score > 0.5
ORDER BY toxicity_score DESC
" -b kb-reviewsFor analytical reads against large corpora, set freshness: STRONG only when you need read-your-writes — see Tables → SQL Compatibility.
5. Query the semantic-search collection
Semantic search complements the SQL dashboard: “find reviews similar to this one specific complaint” is something SQL keyword-matching does badly.
curl -sS -X POST "https://k3.dev.dodil.io/kb-reviews/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-reviews",
"collectionName": "reviews_vec",
"text": "hardware oxidized after a week",
"topK": 10,
"searchMode": "SEARCH_MODE_AUTO",
"rerank": true,
"includeContent": true
}' | jq '.results[] | {score, object: .object.key, content: (.content | .[0:150])}'You’ll get back reviews whose content semantically resembles “hardware tarnished after a week” — including ones that use different wording ("clasp oxidized", "finish wore off", etc.) that a SQL keyword search would miss.
6. The killer combo — JOIN-like pattern across pillars
The real power comes from using both pillars together. Two-step pattern:
- Tables SQL to narrow by structured criteria (sentiment, toxicity, date, product SKU).
- Vector search for semantic recall within the narrowed set.
A. Negative reviews semantically similar to a specific complaint
# Step 1: find object keys of negative reviews in the last 30 days
NEGATIVE_KEYS=$(dodil k3 table query "
SELECT source_key
FROM reviews
WHERE sentiment_label = 'negative'
AND extracted_at > NOW() - INTERVAL 30 DAY
" -b kb-reviews -o json | jq -r '.results[][0]')
# Step 2: vector search, pre-filtered to those keys
KEYS_CSV=$(echo "$NEGATIVE_KEYS" | tr '\n' ',' | sed 's/,$//')
curl -sS -X POST "https://k3.dev.dodil.io/kb-reviews/vector/search" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"bucket\": \"kb-reviews\",
\"collectionName\": \"reviews_vec\",
\"text\": \"hardware quality issues\",
\"topK\": 20,
\"preFilter\": {
\"op\": \"LOGICAL_OP_AND\",
\"filters\": [
{ \"field\": \"source_key\", \"op\": \"FILTER_OP_IN\", \"value\": \"$KEYS_CSV\" }
]
}
}"Pre-filter limits vector search to the keys SQL pre-narrowed. Combine analytical filtering with semantic recall in one query loop.
B. Triage flow — surface negative-and-toxic reviews
For a CX-team notification pipeline:
import requests, sys
# Find toxic + very negative reviews in the last 24h
toxic = requests.post(f"{K3}/kb-reviews/tables/_query",
headers=HEADERS,
json={
"bucket": "kb-reviews", "tableName": "reviews",
"sql": """
SELECT source_key, sentiment_score, toxicity_score
FROM {table}
WHERE toxicity_score > 0.4
AND sentiment_score < -0.5
AND extracted_at > NOW() - INTERVAL 1 DAY
""",
"freshness": "FRESHNESS_STRONG",
},
).json()
for row in toxic["rows"]:
source_key, sentiment, toxicity = row[0], row[1], row[2]
# For each toxic review, find 3 semantically similar ones for context
similar = requests.post(f"{K3}/kb-reviews/vector/search",
headers=HEADERS,
json={
"bucket": "kb-reviews", "collectionName": "reviews_vec",
"text": "", # Use the review's own embedding by referencing the source object
"s3Key": source_key,
"topK": 4, # 1 is the review itself; we want 3 similar
"includeContent": True,
},
).json()
print(f"TOXIC ALERT: {source_key} (sentiment={sentiment}, toxicity={toxicity})")
for s in similar["results"][1:]: # skip self
print(f" similar: {s['object']['key']} (score={s['score']:.3f})")This is what makes K3 different from “object storage with embeddings on the side” — the same upload feeds both structured analysis and semantic recall, queryable side-by-side.
7. Operational maintenance
Replay one pipeline without affecting the other
The two pipelines are independent — replay flows scope to one at a time:
TABLE_RULE_ID=$(dodil k3 ingest list -b kb-reviews -p "$TABLE_PIPELINE_ID" -o json | jq -r '.rules[0].ruleId')
VECTOR_RULE_ID=$(dodil k3 ingest list -b kb-reviews -p "$VECTOR_PIPELINE_ID" -o json | jq -r '.rules[0].ruleId')
SOURCE_ID=$(curl -sS "https://k3.dev.dodil.io/kb-reviews/sources" \
-H "Authorization: Bearer $DODIL_TOKEN" \
| jq -r '.sources[] | select(.name == "internal") | .sourceId')
# Replay only the Tables pipeline (e.g. after a template version bump)
dodil k3 ingest trigger -b kb-reviews -s "$SOURCE_ID" --rule "$TABLE_RULE_ID" --retry-failedSee Pipelines → Replay & Retry for the full replay matrix.
Disable one side without dropping data
# Stop semantic-indexing but keep dashboard ingest flowing
dodil k3 ingest update "$VECTOR_RULE_ID" -b kb-reviews --enabled=falseThe Vector collection retains its existing rows; new uploads only feed Tables until you re-enable.
Common gotchas
| Symptom | Cause | Fix |
|---|---|---|
| Only ONE pipeline fires on upload | The other rule’s globs don’t match the path/extension | List both rules’ includePatterns; both should cover your upload paths. Update the missing one via dodil k3 ingest update --include "**/*.json" |
| Tables shows row but Vector hasn’t indexed yet (or vice versa) | Pipelines run async + independently — different latency per template | Normal. Re-query after the slower pipeline drains. Inspect `dodil k3 object show … -o json |
| Re-uploading the same key creates duplicate rows in Tables | Every PUT (including overwrites) fires the rule | For idempotent ingestion, design your template’s primary key — pipeline-mode tables can deduplicate on PK at MERGE time. For Vector, embeddings get added (no PK dedup); use UpsertVectors patterns if you go EXTERNAL |
source_key in the table doesn’t match the Vector collection’s source_key metadata | Templates emit metadata using their own schema — the field names may differ slightly | Cross-reference using the object key (always present as object.key in vector results and as source_key in table rows by convention) |
| One pipeline keeps failing while the other succeeds | Schema mismatch / template-specific input requirement | Inspect the failing pipeline’s jobs via dodil k3 ingest jobs -p $PIPELINE_ID; read errorDetails. See Pipelines → Replay & Retry for diagnosis patterns |
Cleanup
# Pause both rules
dodil k3 ingest update "$TABLE_RULE_ID" -b kb-reviews --enabled=false
dodil k3 ingest update "$VECTOR_RULE_ID" -b kb-reviews --enabled=false
# Delete in dependency-friendly order (no cascade across pillars)
dodil k3 ingest delete "$TABLE_RULE_ID" -b kb-reviews
dodil k3 ingest delete "$VECTOR_RULE_ID" -b kb-reviews
dodil k3 pipeline delete "$TABLE_PIPELINE_ID" -b kb-reviews
dodil k3 pipeline delete "$VECTOR_PIPELINE_ID" -b kb-reviews
dodil k3 table delete reviews -b kb-reviews
VEC_COL=$(dodil k3 vector collection get reviews_vec -b kb-reviews -o json | jq -r '.collectionId')
dodil k3 vector collection delete "$VEC_COL" -b kb-reviews
dodil k3 vector store delete -b kb-reviews
dodil k3 bucket delete kb-reviewsSee also
- Document Intake — same fan-out shape; uses
document_triageinstead ofreview_analysisand intersects intake routing with semantic recall - Pipelines → Documents → Warehouse — deeper on the Tables-bound pipeline pattern alone
- Tables → Pipeline-bound Table — Tables perspective on pipeline-mode tables
- Vector → Pipeline Collection — Vector perspective on pipeline-mode collections
- Pipelines → Replay & Retry — operating two-pipeline fan-outs in production