Skip to Content
We are live but in Staging 🎉
RecipesReviews Dashboard

Reviews Dashboard + Semantic Search

Goal: every uploaded customer review gets analyzed AND indexed in parallel — sentiment / topics / toxicity land in a warehouse table (for SQL dashboards), while semantic embeddings land in a vector collection (for “find reviews similar to this complaint”). One upload, two destinations, zero glue code.

Primitives used: Storage + Pipelines (two pipelines, same source) + Tables + Vector. The fan-out happens because K3 lets you have multiple ingest rules on a single source, each routing to its own pipeline + destination.

Shape:

review.json ──upload──► Storage bucket ▼ internal-S3 source fires two matching rules ┌───────┴───────┐ │ │ ▼ ▼ Pipeline A Pipeline B (review_analysis) (text_embedding_index) │ │ ▼ ▼ Tables: reviews Vector: reviews_vec (sentiment, topics, (semantic embeddings keywords, toxicity) for recall) │ │ ▼ ▼ SQL queries / BI Search RPC / RAG dashboards / "similar to X"

Prerequisites

  • dodil CLI + dodil login
  • A bucket — kb-reviews:
    dodil k3 bucket create kb-reviews -d "Customer review intake"
  • Tables engine is auto-enabled at bucket creation.
  • Vector engine needs explicit config:
    dodil k3 vector store create -b kb-reviews -m auto # Wait for ACTIVE

1. Create the two destinations

A. Pipeline-bound table for sentiment

Pick the review_analysis template — sentiment + toxicity + keyword extraction, emits structured rows:

# Browse — confirm review_analysis is warehouse-compatible dodil k3 table templates --search review -o json | jq '.templates[] | {id, modalities}' # Create the table with pipeline binding dodil k3 table create reviews -b kb-reviews \ --description "Customer reviews — sentiment analyzed" \ --source pipeline_generated \ --pipeline-template-id review_analysis

K3 atomically creates: the table row + a Scriptum pipeline + an auto-generated ingest rule. Schema lazy-materializes on first ingest.

B. Pipeline-mode vector collection for semantic recall

dodil k3 vector collection add reviews_vec -b kb-reviews \ --description "Customer reviews — semantic embeddings" \ --template text_embedding_index

Same pattern: collection + pipeline + auto-rule. Now you have two pipelines in kb-reviews, both pointing at the bucket’s internal-S3 source. Both will fire on every matching review upload.

Capture identifiers for later operational steps:

export TABLE_PIPELINE_ID=$(dodil k3 table get reviews -b kb-reviews -o json | jq -r '.pipelineId') export VECTOR_PIPELINE_ID=$(dodil k3 vector collection get reviews_vec -b kb-reviews -o json | jq -r '.embedPipelineId') # Confirm both rules are enabled and have matching globs dodil k3 ingest list -b kb-reviews -o json \ | jq '.rules[] | {ruleId, name, pipelineId, includePatterns, enabled}'

You should see two rules, both enabled: true, with overlapping include patterns (both will match **/*.json, **/*.txt, etc.). Each rule points at its own pipeline.

Why two rules instead of one shared rule? Each pipeline owns its destination, and rules carry pipeline_id. K3 dispatches one rule → one pipeline per match. To fan out to N pipelines, you need N rules. K3 generates them for you when you create pipeline-bound destinations via the high-level RPCs (CreateTablePipeline and AddVectorPipeline).

2. Upload a review

Stage some sample review JSON:

cat > review-001.json <<'EOF' { "id": "REV-001", "customer_id": "cust-91872", "product_sku": "BAG-LEATHER-BROWN-01", "rating": 2, "title": "Hardware tarnished after one week", "body": "Bought this thinking it'd last. Within a week the gold-tone hardware started oxidizing. Customer service was slow and unhelpful. Disappointed for the price." } EOF dodil k3 object create ./review-001.json -b kb-reviews -k reviews/2026/05/REV-001.json

Single upload, two pipelines fire in parallel.

3. Watch both ingest jobs

# Job from the Tables pipeline dodil k3 ingest jobs -b kb-reviews -p "$TABLE_PIPELINE_ID" -o json \ | jq '.jobs[] | {pipeline: "table", object: .object.key, status, rowsWritten}' # Job from the Vector pipeline dodil k3 ingest jobs -b kb-reviews -p "$VECTOR_PIPELINE_ID" -o json \ | jq '.jobs[] | {pipeline: "vector", object: .object.key, status, chunksCreated, embeddingsWritten}'

You should see one job from each pipeline for the same object key. The Tables job’s rows_written counts how many structured rows landed in reviews; the Vector job’s chunks_created / embeddings_written count what landed in reviews_vec.

Cross-check via the object’s pipelineStatuses:

dodil k3 object show reviews/2026/05/REV-001.json -b kb-reviews -o json \ | jq '.pipelineStatuses[]'

Two entries, one per rule that fired.

4. Query the dashboard table

Once the Tables drain completes, the structured rows are queryable. First, describe the table to see the resolved schema (template-driven, lazy):

dodil k3 table describe reviews -b kb-reviews -o json \ | jq '{ tableName, version, columns: [.columns[] | {name, type}], rowCount }'

For review_analysis you’ll typically see columns like source_key, review_id, sentiment_score, sentiment_label, intent, topics, toxicity_score, keywords, extracted_at. Exact column names depend on the template’s ScriptContract.

Now run SQL — the kind of queries that drive a BI dashboard:

# Overall sentiment distribution dodil k3 table query " SELECT sentiment_label, COUNT(*) AS n FROM reviews GROUP BY sentiment_label ORDER BY n DESC " -b kb-reviews # Top complaint topics in the last week dodil k3 table query " SELECT topic, COUNT(*) AS n FROM (SELECT UNNEST(topics) AS topic FROM reviews WHERE sentiment_label = 'negative') GROUP BY topic ORDER BY n DESC LIMIT 10 " -b kb-reviews # Toxicity outliers — needs CX team review dodil k3 table query " SELECT source_key, sentiment_score, toxicity_score FROM reviews WHERE toxicity_score > 0.5 ORDER BY toxicity_score DESC " -b kb-reviews

For analytical reads against large corpora, set freshness: STRONG only when you need read-your-writes — see Tables → SQL Compatibility.

5. Query the semantic-search collection

Semantic search complements the SQL dashboard: “find reviews similar to this one specific complaint” is something SQL keyword-matching does badly.

curl -sS -X POST "https://k3.dev.dodil.io/kb-reviews/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-reviews", "collectionName": "reviews_vec", "text": "hardware oxidized after a week", "topK": 10, "searchMode": "SEARCH_MODE_AUTO", "rerank": true, "includeContent": true }' | jq '.results[] | {score, object: .object.key, content: (.content | .[0:150])}'

You’ll get back reviews whose content semantically resembles “hardware tarnished after a week” — including ones that use different wording ("clasp oxidized", "finish wore off", etc.) that a SQL keyword search would miss.

6. The killer combo — JOIN-like pattern across pillars

The real power comes from using both pillars together. Two-step pattern:

  1. Tables SQL to narrow by structured criteria (sentiment, toxicity, date, product SKU).
  2. Vector search for semantic recall within the narrowed set.

A. Negative reviews semantically similar to a specific complaint

# Step 1: find object keys of negative reviews in the last 30 days NEGATIVE_KEYS=$(dodil k3 table query " SELECT source_key FROM reviews WHERE sentiment_label = 'negative' AND extracted_at > NOW() - INTERVAL 30 DAY " -b kb-reviews -o json | jq -r '.results[][0]') # Step 2: vector search, pre-filtered to those keys KEYS_CSV=$(echo "$NEGATIVE_KEYS" | tr '\n' ',' | sed 's/,$//') curl -sS -X POST "https://k3.dev.dodil.io/kb-reviews/vector/search" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d "{ \"bucket\": \"kb-reviews\", \"collectionName\": \"reviews_vec\", \"text\": \"hardware quality issues\", \"topK\": 20, \"preFilter\": { \"op\": \"LOGICAL_OP_AND\", \"filters\": [ { \"field\": \"source_key\", \"op\": \"FILTER_OP_IN\", \"value\": \"$KEYS_CSV\" } ] } }"

Pre-filter limits vector search to the keys SQL pre-narrowed. Combine analytical filtering with semantic recall in one query loop.

B. Triage flow — surface negative-and-toxic reviews

For a CX-team notification pipeline:

import requests, sys # Find toxic + very negative reviews in the last 24h toxic = requests.post(f"{K3}/kb-reviews/tables/_query", headers=HEADERS, json={ "bucket": "kb-reviews", "tableName": "reviews", "sql": """ SELECT source_key, sentiment_score, toxicity_score FROM {table} WHERE toxicity_score > 0.4 AND sentiment_score < -0.5 AND extracted_at > NOW() - INTERVAL 1 DAY """, "freshness": "FRESHNESS_STRONG", }, ).json() for row in toxic["rows"]: source_key, sentiment, toxicity = row[0], row[1], row[2] # For each toxic review, find 3 semantically similar ones for context similar = requests.post(f"{K3}/kb-reviews/vector/search", headers=HEADERS, json={ "bucket": "kb-reviews", "collectionName": "reviews_vec", "text": "", # Use the review's own embedding by referencing the source object "s3Key": source_key, "topK": 4, # 1 is the review itself; we want 3 similar "includeContent": True, }, ).json() print(f"TOXIC ALERT: {source_key} (sentiment={sentiment}, toxicity={toxicity})") for s in similar["results"][1:]: # skip self print(f" similar: {s['object']['key']} (score={s['score']:.3f})")

This is what makes K3 different from “object storage with embeddings on the side” — the same upload feeds both structured analysis and semantic recall, queryable side-by-side.

7. Operational maintenance

Replay one pipeline without affecting the other

The two pipelines are independent — replay flows scope to one at a time:

TABLE_RULE_ID=$(dodil k3 ingest list -b kb-reviews -p "$TABLE_PIPELINE_ID" -o json | jq -r '.rules[0].ruleId') VECTOR_RULE_ID=$(dodil k3 ingest list -b kb-reviews -p "$VECTOR_PIPELINE_ID" -o json | jq -r '.rules[0].ruleId') SOURCE_ID=$(curl -sS "https://k3.dev.dodil.io/kb-reviews/sources" \ -H "Authorization: Bearer $DODIL_TOKEN" \ | jq -r '.sources[] | select(.name == "internal") | .sourceId') # Replay only the Tables pipeline (e.g. after a template version bump) dodil k3 ingest trigger -b kb-reviews -s "$SOURCE_ID" --rule "$TABLE_RULE_ID" --retry-failed

See Pipelines → Replay & Retry for the full replay matrix.

Disable one side without dropping data

# Stop semantic-indexing but keep dashboard ingest flowing dodil k3 ingest update "$VECTOR_RULE_ID" -b kb-reviews --enabled=false

The Vector collection retains its existing rows; new uploads only feed Tables until you re-enable.

Common gotchas

SymptomCauseFix
Only ONE pipeline fires on uploadThe other rule’s globs don’t match the path/extensionList both rules’ includePatterns; both should cover your upload paths. Update the missing one via dodil k3 ingest update --include "**/*.json"
Tables shows row but Vector hasn’t indexed yet (or vice versa)Pipelines run async + independently — different latency per templateNormal. Re-query after the slower pipeline drains. Inspect `dodil k3 object show … -o json
Re-uploading the same key creates duplicate rows in TablesEvery PUT (including overwrites) fires the ruleFor idempotent ingestion, design your template’s primary key — pipeline-mode tables can deduplicate on PK at MERGE time. For Vector, embeddings get added (no PK dedup); use UpsertVectors patterns if you go EXTERNAL
source_key in the table doesn’t match the Vector collection’s source_key metadataTemplates emit metadata using their own schema — the field names may differ slightlyCross-reference using the object key (always present as object.key in vector results and as source_key in table rows by convention)
One pipeline keeps failing while the other succeedsSchema mismatch / template-specific input requirementInspect the failing pipeline’s jobs via dodil k3 ingest jobs -p $PIPELINE_ID; read errorDetails. See Pipelines → Replay & Retry for diagnosis patterns

Cleanup

# Pause both rules dodil k3 ingest update "$TABLE_RULE_ID" -b kb-reviews --enabled=false dodil k3 ingest update "$VECTOR_RULE_ID" -b kb-reviews --enabled=false # Delete in dependency-friendly order (no cascade across pillars) dodil k3 ingest delete "$TABLE_RULE_ID" -b kb-reviews dodil k3 ingest delete "$VECTOR_RULE_ID" -b kb-reviews dodil k3 pipeline delete "$TABLE_PIPELINE_ID" -b kb-reviews dodil k3 pipeline delete "$VECTOR_PIPELINE_ID" -b kb-reviews dodil k3 table delete reviews -b kb-reviews VEC_COL=$(dodil k3 vector collection get reviews_vec -b kb-reviews -o json | jq -r '.collectionId') dodil k3 vector collection delete "$VEC_COL" -b kb-reviews dodil k3 vector store delete -b kb-reviews dodil k3 bucket delete kb-reviews

See also