Reviews Dashboard + Semantic Search

Goal: every uploaded customer review gets analyzed AND indexed in parallel — sentiment / topics / toxicity land in a warehouse table (for SQL dashboards), while semantic embeddings land in a vector collection (for “find reviews similar to this complaint”). One upload, two destinations, zero glue code.

Primitives used: Storage + Pipelines (two pipelines, same source) + Tables + Vector. The fan-out happens because K3 lets you have multiple ingest rules on a single source, each routing to its own pipeline + destination.

Shape:


   review.json  ──upload──►  Storage bucket
                                    │
                                    ▼  internal-S3 source fires two matching rules
                            ┌───────┴───────┐
                            │               │
                            ▼               ▼
                Pipeline A           Pipeline B
            (review_analysis)   (text_embedding_index)
                            │               │
                            ▼               ▼
                Tables: reviews     Vector: reviews_vec
                (sentiment, topics, (semantic embeddings
                 keywords, toxicity)  for recall)
                            │               │
                            ▼               ▼
              SQL queries / BI    Search RPC / RAG
              dashboards          / "similar to X"

Prerequisites

dodil CLI + dodil login

A bucket — kb-reviews:


dodil k3 bucket create kb-reviews -d "Customer review intake"

Tables engine is auto-enabled at bucket creation.

Vector engine needs explicit config:


dodil k3 vector store create -b kb-reviews -m auto
# Wait for ACTIVE

1. Create the two destinations

A. Pipeline-bound table for sentiment

Pick the review_analysis template — sentiment + toxicity + keyword extraction, emits structured rows:


# Browse — confirm review_analysis is warehouse-compatible
dodil k3 table templates --search review -o json | jq '.templates[] | {id, modalities}'
 
# Create the table with pipeline binding
dodil k3 table create reviews -b kb-reviews \
  --description "Customer reviews — sentiment analyzed" \
  --source pipeline_generated \
  --pipeline-template-id review_analysis

K3 atomically creates: the table row + a Scriptum pipeline + an auto-generated ingest rule. Schema lazy-materializes on first ingest.

B. Pipeline-mode vector collection for semantic recall


dodil k3 vector collection add reviews_vec -b kb-reviews \
  --description "Customer reviews — semantic embeddings" \
  --template text_embedding_index

Same pattern: collection + pipeline + auto-rule. Now you have two pipelines in kb-reviews, both pointing at the bucket’s internal-S3 source. Both will fire on every matching review upload.

Capture identifiers for later operational steps:


export TABLE_PIPELINE_ID=$(dodil k3 table get reviews -b kb-reviews -o json | jq -r '.pipelineId')
export VECTOR_PIPELINE_ID=$(dodil k3 vector collection get reviews_vec -b kb-reviews -o json | jq -r '.embedPipelineId')
 
# Confirm both rules are enabled and have matching globs
dodil k3 ingest list -b kb-reviews -o json \
  | jq '.rules[] | {ruleId, name, pipelineId, includePatterns, enabled}'

You should see two rules, both enabled: true, with overlapping include patterns (both will match **/*.json, **/*.txt, etc.). Each rule points at its own pipeline.

Why two rules instead of one shared rule? Each pipeline owns its destination, and rules carry pipeline_id. K3 dispatches one rule → one pipeline per match. To fan out to N pipelines, you need N rules. K3 generates them for you when you create pipeline-bound destinations via the high-level RPCs (CreateTablePipeline and AddVectorPipeline).

2. Upload a review

Stage some sample review JSON:


cat > review-001.json <<'EOF'
{
  "id": "REV-001",
  "customer_id": "cust-91872",
  "product_sku": "BAG-LEATHER-BROWN-01",
  "rating": 2,
  "title": "Hardware tarnished after one week",
  "body": "Bought this thinking it'd last. Within a week the gold-tone hardware started oxidizing. Customer service was slow and unhelpful. Disappointed for the price."
}
EOF
 
dodil k3 object create ./review-001.json -b kb-reviews -k reviews/2026/05/REV-001.json

Single upload, two pipelines fire in parallel.

3. Watch both ingest jobs


# Job from the Tables pipeline
dodil k3 ingest jobs -b kb-reviews -p "$TABLE_PIPELINE_ID" -o json \
  | jq '.jobs[] | {pipeline: "table", object: .object.key, status, rowsWritten}'
 
# Job from the Vector pipeline
dodil k3 ingest jobs -b kb-reviews -p "$VECTOR_PIPELINE_ID" -o json \
  | jq '.jobs[] | {pipeline: "vector", object: .object.key, status, chunksCreated, embeddingsWritten}'

You should see one job from each pipeline for the same object key. The Tables job’s rows_written counts how many structured rows landed in reviews; the Vector job’s chunks_created / embeddings_written count what landed in reviews_vec.

Cross-check via the object’s pipelineStatuses:


dodil k3 object show reviews/2026/05/REV-001.json -b kb-reviews -o json \
  | jq '.pipelineStatuses[]'

Two entries, one per rule that fired.

4. Query the dashboard table

Once the Tables drain completes, the structured rows are queryable. First, describe the table to see the resolved schema (template-driven, lazy):


dodil k3 table describe reviews -b kb-reviews -o json \
  | jq '{
      tableName, version,
      columns: [.columns[] | {name, type}],
      rowCount
    }'

For review_analysis you’ll typically see columns like source_key, review_id, sentiment_score, sentiment_label, intent, topics, toxicity_score, keywords, extracted_at. Exact column names depend on the template’s ScriptContract.

Now run SQL — the kind of queries that drive a BI dashboard:


# Overall sentiment distribution
dodil k3 table query "
  SELECT sentiment_label, COUNT(*) AS n
  FROM reviews
  GROUP BY sentiment_label
  ORDER BY n DESC
" -b kb-reviews
 
# Top complaint topics in the last week
dodil k3 table query "
  SELECT topic, COUNT(*) AS n
  FROM (SELECT UNNEST(topics) AS topic FROM reviews WHERE sentiment_label = 'negative')
  GROUP BY topic
  ORDER BY n DESC
  LIMIT 10
" -b kb-reviews
 
# Toxicity outliers — needs CX team review
dodil k3 table query "
  SELECT source_key, sentiment_score, toxicity_score
  FROM reviews
  WHERE toxicity_score > 0.5
  ORDER BY toxicity_score DESC
" -b kb-reviews

For analytical reads against large corpora, set freshness: STRONG only when you need read-your-writes — see Tables → SQL Compatibility.

5. Query the semantic-search collection

Semantic search complements the SQL dashboard: “find reviews similar to this one specific complaint” is something SQL keyword-matching does badly.


curl -sS -X POST "https://k3.dev.dodil.io/kb-reviews/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-reviews",
    "collectionName": "reviews_vec",
    "text": "hardware oxidized after a week",
    "topK": 10,
    "searchMode": "SEARCH_MODE_AUTO",
    "rerank": true,
    "includeContent": true
  }' | jq '.results[] | {score, object: .object.key, content: (.content | .[0:150])}'

You’ll get back reviews whose content semantically resembles “hardware tarnished after a week” — including ones that use different wording ("clasp oxidized", "finish wore off", etc.) that a SQL keyword search would miss.

6. The killer combo — JOIN-like pattern across pillars

The real power comes from using both pillars together. Two-step pattern:

Tables SQL to narrow by structured criteria (sentiment, toxicity, date, product SKU).
Vector search for semantic recall within the narrowed set.

A. Negative reviews semantically similar to a specific complaint


# Step 1: find object keys of negative reviews in the last 30 days
NEGATIVE_KEYS=$(dodil k3 table query "
  SELECT source_key
  FROM reviews
  WHERE sentiment_label = 'negative'
    AND extracted_at > NOW() - INTERVAL 30 DAY
" -b kb-reviews -o json | jq -r '.results[][0]')
 
# Step 2: vector search, pre-filtered to those keys
KEYS_CSV=$(echo "$NEGATIVE_KEYS" | tr '\n' ',' | sed 's/,$//')
 
curl -sS -X POST "https://k3.dev.dodil.io/kb-reviews/vector/search" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"bucket\": \"kb-reviews\",
    \"collectionName\": \"reviews_vec\",
    \"text\": \"hardware quality issues\",
    \"topK\": 20,
    \"preFilter\": {
      \"op\": \"LOGICAL_OP_AND\",
      \"filters\": [
        { \"field\": \"source_key\", \"op\": \"FILTER_OP_IN\", \"value\": \"$KEYS_CSV\" }
      ]
    }
  }"

Pre-filter limits vector search to the keys SQL pre-narrowed. Combine analytical filtering with semantic recall in one query loop.

B. Triage flow — surface negative-and-toxic reviews

For a CX-team notification pipeline:


import requests, sys
 
# Find toxic + very negative reviews in the last 24h
toxic = requests.post(f"{K3}/kb-reviews/tables/_query",
    headers=HEADERS,
    json={
        "bucket": "kb-reviews", "tableName": "reviews",
        "sql": """
            SELECT source_key, sentiment_score, toxicity_score
            FROM {table}
            WHERE toxicity_score > 0.4
              AND sentiment_score < -0.5
              AND extracted_at > NOW() - INTERVAL 1 DAY
        """,
        "freshness": "FRESHNESS_STRONG",
    },
).json()
 
for row in toxic["rows"]:
    source_key, sentiment, toxicity = row[0], row[1], row[2]
    # For each toxic review, find 3 semantically similar ones for context
    similar = requests.post(f"{K3}/kb-reviews/vector/search",
        headers=HEADERS,
        json={
            "bucket": "kb-reviews", "collectionName": "reviews_vec",
            "text": "",  # Use the review's own embedding by referencing the source object
            "s3Key": source_key,
            "topK": 4,  # 1 is the review itself; we want 3 similar
            "includeContent": True,
        },
    ).json()
 
    print(f"TOXIC ALERT: {source_key} (sentiment={sentiment}, toxicity={toxicity})")
    for s in similar["results"][1:]:  # skip self
        print(f"  similar: {s['object']['key']} (score={s['score']:.3f})")

This is what makes K3 different from “object storage with embeddings on the side” — the same upload feeds both structured analysis and semantic recall, queryable side-by-side.

7. Operational maintenance

Replay one pipeline without affecting the other

The two pipelines are independent — replay flows scope to one at a time:


TABLE_RULE_ID=$(dodil k3 ingest list -b kb-reviews -p "$TABLE_PIPELINE_ID" -o json | jq -r '.rules[0].ruleId')
VECTOR_RULE_ID=$(dodil k3 ingest list -b kb-reviews -p "$VECTOR_PIPELINE_ID" -o json | jq -r '.rules[0].ruleId')
 
SOURCE_ID=$(curl -sS "https://k3.dev.dodil.io/kb-reviews/sources" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  | jq -r '.sources[] | select(.name == "internal") | .sourceId')
 
# Replay only the Tables pipeline (e.g. after a template version bump)
dodil k3 ingest trigger -b kb-reviews -s "$SOURCE_ID" --rule "$TABLE_RULE_ID" --retry-failed

See Pipelines → Replay & Retry for the full replay matrix.

Disable one side without dropping data


# Stop semantic-indexing but keep dashboard ingest flowing
dodil k3 ingest update "$VECTOR_RULE_ID" -b kb-reviews --enabled=false

The Vector collection retains its existing rows; new uploads only feed Tables until you re-enable.

Common gotchas

Symptom	Cause	Fix
Only ONE pipeline fires on upload	The other rule’s globs don’t match the path/extension	List both rules’ `includePatterns`; both should cover your upload paths. Update the missing one via `dodil k3 ingest update --include "*/.json"`
Tables shows row but Vector hasn’t indexed yet (or vice versa)	Pipelines run async + independently — different latency per template	Normal. Re-query after the slower pipeline drains. Inspect `dodil k3 object show … -o json
Re-uploading the same key creates duplicate rows in Tables	Every PUT (including overwrites) fires the rule	For idempotent ingestion, design your template’s primary key — pipeline-mode tables can deduplicate on PK at MERGE time. For Vector, embeddings get added (no PK dedup); use `UpsertVectors` patterns if you go EXTERNAL
`source_key` in the table doesn’t match the Vector collection’s `source_key` metadata	Templates emit metadata using their own schema — the field names may differ slightly	Cross-reference using the object key (always present as `object.key` in vector results and as `source_key` in table rows by convention)
One pipeline keeps failing while the other succeeds	Schema mismatch / template-specific input requirement	Inspect the failing pipeline’s jobs via `dodil k3 ingest jobs -p $PIPELINE_ID`; read `errorDetails`. See Pipelines → Replay & Retry for diagnosis patterns

Cleanup


# Pause both rules
dodil k3 ingest update "$TABLE_RULE_ID" -b kb-reviews --enabled=false
dodil k3 ingest update "$VECTOR_RULE_ID" -b kb-reviews --enabled=false
 
# Delete in dependency-friendly order (no cascade across pillars)
dodil k3 ingest delete "$TABLE_RULE_ID" -b kb-reviews
dodil k3 ingest delete "$VECTOR_RULE_ID" -b kb-reviews
 
dodil k3 pipeline delete "$TABLE_PIPELINE_ID" -b kb-reviews
dodil k3 pipeline delete "$VECTOR_PIPELINE_ID" -b kb-reviews
 
dodil k3 table delete reviews -b kb-reviews
VEC_COL=$(dodil k3 vector collection get reviews_vec -b kb-reviews -o json | jq -r '.collectionId')
dodil k3 vector collection delete "$VEC_COL" -b kb-reviews
 
dodil k3 vector store delete -b kb-reviews
dodil k3 bucket delete kb-reviews