Skip to Content
We are live but in Staging 🎉
TablesRecipesManual Table

Manual Table

Goal: build a structured table from scratch — schema you control, data you push, queries you write — then keep it healthy with periodic maintenance. The canonical SQL-first workflow.

Shape:

CreateTable (manual) → Insert / Merge / Update / DeleteRows ┌──────────┼──────────┐ ▼ ▼ ▼ Query Compact Optimize (eventual (drain log (bin-pack + strong) → Delta) files)

Prerequisites

  • dodil CLI installed and dodil login done — CLI Basics.
  • A bucket — kb-prod:
    dodil k3 bucket create kb-prod -d "Events warehouse"
  • Engine is already enabled — every bucket gets one on CreateBucket.

1. Create the table

We’ll model an events table with a composite primary key, a partition column, and a JSON column for arbitrary payload:

dodil k3 table create events --bucket kb-prod \ --description "Click + purchase events" \ --columns-json '[ {"name":"id", "type":"COLUMN_TYPE_LONG", "nullable":false}, {"name":"user_id", "type":"COLUMN_TYPE_STRING", "nullable":false}, {"name":"occurred_at", "type":"COLUMN_TYPE_TIMESTAMP", "nullable":false}, {"name":"event_type", "type":"COLUMN_TYPE_STRING", "nullable":false}, {"name":"payload", "type":"COLUMN_TYPE_JSON", "nullable":true} ]' \ --partition-column event_type \ --merge-key id --merge-key user_id

The two key choices:

ChoiceEffect
Composite PK (--merge-key id --merge-key user_id)Writes whose predicate matches both columns route through the write log → drain → Delta MERGE. Without a PK, writes go direct to Delta — fast but no read-your-writes via freshness=STRONG.
Partition column (event_type)Delta physically groups rows by this column. Reads that filter on it can prune entire partitions; large partitioned tables get FEDERATED_* read strategies.

Verify the table is ACTIVE:

dodil k3 table describe events --bucket kb-prod -o json \ | jq '{name, status, columns: [.columns[] | {name, type, nullable}], partitionColumns, pkColumns}'

Expect status: TABLE_STATUS_ACTIVE and pkColumns: ["id", "user_id"]. The pkColumns here is authoritative — it’s what the planner uses to decide write strategies.

2. Insert rows

Bulk insert via the typed shortcut:

dodil k3 table insert events --bucket kb-prod \ --row '{"id":1,"user_id":"u-101","occurred_at":1716840000000000,"event_type":"click", "payload":{"page":"/pricing"}}' \ --row '{"id":2,"user_id":"u-101","occurred_at":1716840060000000,"event_type":"click", "payload":{"page":"/signup"}}' \ --row '{"id":3,"user_id":"u-102","occurred_at":1716840120000000,"event_type":"purchase","payload":{"sku":"A-12","amount":49.99}}'

Sample response:

{ "rowsWritten": "3", "version": "0", "pendingDrain": true }
  • rowsWritten: 3 — three write-log appends accepted.
  • version: 0 — Delta unchanged until the compactor drains; 0 because no Delta commit has happened from these writes yet.
  • pendingDrain: true — confirms keyed routing (writes landed in the log; will MERGE on next drain).

timestamp values are microseconds since epoch UTC. Alternative wire shape: "2026-05-27 10:00:00" string (same column). The CLI’s --row is a literal JSON object; either works.

3. Query — eventual freshness (default)

dodil k3 table query \ "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC" \ --bucket kb-prod

If you run this immediately after step 2 — you may see zero rows. That’s FRESHNESS_EVENTUAL (the default) reading Delta only; the writes are still in the log waiting for the compactor.

You have three options to see the rows:

  1. Wait for automatic compaction (background job — happens periodically).
  2. Force a compaction explicitly:
    dodil k3 table compact events --bucket kb-prod
  3. Switch to strong freshness for this query — covered in step 4.

4. Query — strong freshness (read-your-writes)

The CLI doesn’t yet have a --freshness flag, so use the API directly for FRESHNESS_STRONG:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "sql": "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC", "freshness": "FRESHNESS_STRONG" }' | jq '.query | {strategy, servedBy, rowCount, rows}'

Expect:

{ "strategy": "QUERY_STRATEGY_UNARY_MERGED_STRONG", "servedBy": "SERVED_BY_MERGED", "rowCount": "2", "rows": [ "[\"click\", 2]", "[\"purchase\", 1]" ] }

servedBy: SERVED_BY_MERGED confirms the query merged the write log with Delta. Pays a small overhead vs EVENTUAL — use sparingly on hot paths, freely for “did my write land?” checks.

5. Upsert with MERGE

Replace an existing row + add a new one in one call:

dodil k3 table merge events --bucket kb-prod \ --match-column id --match-column user_id \ --row '{"id":1,"user_id":"u-101","occurred_at":1716840000000000,"event_type":"click_pricing","payload":{"page":"/pricing","variant":"B"}}' \ --row '{"id":4,"user_id":"u-103","occurred_at":1716840180000000,"event_type":"signup","payload":{"plan":"pro"}}'

Sample response:

{ "rowsWritten": "2", "version": "0", "pendingDrain": true }

rowsWritten: 2 counts source rows accepted into the log — regardless of whether each row will ultimately MATCH (id=1) or NOT MATCH (id=4) on drain. The post-drain breakdown (insert/update/delete) emerges via dodil k3 table describe’s lastDrain* counters.

Verify via strong-freshness query:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "sql": "SELECT id, user_id, event_type FROM events WHERE user_id IN ('\''u-101'\'', '\''u-103'\'') ORDER BY id", "freshness": "FRESHNESS_STRONG" }' | jq '.query.rows'

You should see id=1 with event_type=click_pricing (updated), and id=4 with signup (inserted).

6. Update + delete (keyed)

Update via PK (predicate matches merge_keys):

dodil k3 table update events --bucket kb-prod \ --predicate "id = 2 AND user_id = 'u-101'" \ --updates-json '{"event_type":"click_signup"}'

Routes through the write log (KEYED_UPDATE strategy). Same pendingDrain: true semantics.

Delete via PK:

dodil k3 table delete-rows events --bucket kb-prod \ --predicate "id = 3 AND user_id = 'u-102'"

Writes a tombstone into the log. Compactor applies the Delta DELETE on next drain.

7. Maintenance — compact + optimize

After bulk writes, drain the log to Delta and pack the files:

# Step 1 — drain the write log into Delta while dodil k3 table compact events --bucket kb-prod -o json \ | jq -e '.truncated == true' > /dev/null; do echo " more entries to drain..." done # Step 2 — bin-pack small Delta files dodil k3 table optimize events --bucket kb-prod -o json \ | jq '{filesRemoved, filesAdded, bytesRemoved, bytesAdded, partitionsOptimized}'

You don’t usually need to run these manually — K3 maintains the drain in the background — but explicit calls are useful:

  • After large bulk writes when you want rows visible to EVENTUAL reads immediately
  • As the CompactOptimize canonical post-batch sequence
  • In tests / e2e to assert behavior against describe’s drain counters

8. Describe to see drain stats

describe is the single richest read on a table — schema, Delta location + version, write-log backlog, drain-lag SLO, last-drain row classification:

dodil k3 table describe events --bucket kb-prod -o json \ | jq '{ tableName, version, walObjectCount, walTotalBytes, drainLagSecs, lastDrainInsertedRows, lastDrainUpdatedRows, lastDrainDeletedRows }'

After step 7’s compact loop, expect:

{ "tableName": "events", "version": "5", "walObjectCount": "0", "walTotalBytes": "0", "drainLagSecs": "0", "lastDrainInsertedRows": "1", "lastDrainUpdatedRows": "3", "lastDrainDeletedRows": "1" }

The lastDrain* counters are the truthful post-drain row classification — they tell you how many rows the compactor actually inserted/updated/deleted on the last drain. This is what rowsWritten can’t synchronously surface (the pre-drain log doesn’t yet know match-vs-no-match).

Cleanup

dodil k3 table delete events --bucket kb-prod

Common gotchas

SymptomCauseFix
Insert returns pendingDrain: false and you expected trueTable has no merge_keys — writes go direct to DeltaConfirm via dodil k3 table describepkColumns populated; recreate with --merge-key if needed
EVENTUAL query returns 0 rows but you just insertedWrites are in the write log, not yet drainedUse FRESHNESS_STRONG (API) or run compact
MERGE reports rowsWritten: N but describe.lastDrainInsertedRows is differentPre-drain rowsWritten is source-row count; post-drain insert/update split emerges after the compactor MERGERun compact, then re-read describe — counters realign
UPDATE with non-PK predicate succeeds but doesn’t appear in EVENTUAL readsNON_KEYED_UPDATE writes directly to Delta — should appear immediatelyIf still missing: log entries for the same PKs may have un-drained writes that overwrote on next drain. Run compact first then re-run the UPDATE. See Execute → UPDATE for the safe pattern.
optimize shows partitionsOptimized: 0Table was already well-packed (typical for routinely-maintained HTAP tables)Not an error — no-op is expected
compact keeps returning truncated: trueLog has more than --batch-size entriesLoop until truncated: false (see step 7)

Variations

VariationWhat changes
No-PK table (audit log style)Skip --merge-key; writes go direct to Delta (NON_KEYED_INSERT). No upserts, no FRESHNESS_STRONG read-your-writes — but faster writes for append-only workloads.
Multi-column partitionPass --partition-column multiple times. Reads that filter on all partition columns prune most aggressively.
JSON-heavy schemaReduce typed columns; lean on payload JSON. Trade-off: JSON ops (->>, json_extract) are slower than typed-column predicates. Promote hot JSON fields to typed columns when you query them often.
Wider PKMore --merge-key flags = composite PK. Predicates must match all PK columns for keyed routing — partial-PK predicates fall to non-keyed.
Default expressionsAdd "defaultExpression": "GENERATE_UUID()" (or "CURRENT_TIMESTAMP()", "ULID()", literal "42", etc.) to a column in --columns-json — applied at row commit when the column is absent. See SQL Compatibility → default_expression.

See also