Skip to Content
We are live but in Staging 🎉
WorkflowsWorkflow: Operations Checklist

Workflow: Operations Checklist

Use this checklist for day-2 operations and incident triage.

1. Service Health and Readiness

curl -sS "https://k3.dev.dodil.io/health" curl -sS "https://k3.dev.dodil.io/healthz" curl -sS "https://k3.dev.dodil.io/ready" curl -sS "https://k3.dev.dodil.io/readyz" curl -sS "https://k3.dev.dodil.io/metrics"

2. Bucket and Engine Baseline

dodil k3 bucket list dodil k3 vector-store get --bucket "$K3_BUCKET" dodil k3 engine get --bucket "$K3_BUCKET"

3. Ingest Pipeline Health

dodil k3 ingest list --bucket "$K3_BUCKET" dodil k3 ingest jobs --bucket "$K3_BUCKET" -o json

Focus on:

  • repeated FAILED or RETRYING jobs
  • spikes in pending jobs
  • mismatched source/rule/pipeline IDs

4. Search Quality Checks

dodil k3 \ search "health probe query" --bucket "$K3_BUCKET" --top-k 5

If no results:

  1. Verify collection exists.
  2. Verify ingest jobs completed.
  3. Lower --min-score and retest.

5. Table Drain and Maintenance Checks

dodil k3 \ table describe events --bucket "$K3_BUCKET" dodil k3 \ table compact events --bucket "$K3_BUCKET"

Focus on:

  • WAL backlog and drain lag in describe output
  • post-compaction stability of query results

6. Frequent Failure Patterns

  1. Auth errors: token expired or wrong org header.
  2. Object upload failure: missing https:// in API endpoint.
  3. Ingest idle: discovery not run, or source/rule mismatch.
  4. Table query errors: engine not enabled in bucket.

Back to workflow index