Manual Table

Goal: build a structured table from scratch — schema you control, data you push, queries you write — then keep it healthy with periodic maintenance. The canonical SQL-first workflow.

Shape:


CreateTable (manual)   →   Insert / Merge / Update / DeleteRows
                                       │
                            ┌──────────┼──────────┐
                            ▼          ▼          ▼
                         Query     Compact     Optimize
                        (eventual  (drain log  (bin-pack
                         + strong)  → Delta)    files)

Prerequisites

dodil CLI installed and dodil login done — CLI Basics.

A bucket — kb-prod:


dodil k3 bucket create kb-prod -d "Events warehouse"

Engine is already enabled — every bucket gets one on CreateBucket.

1. Create the table

We’ll model an events table with a composite primary key, a partition column, and a JSON column for arbitrary payload:


dodil k3 table create events --bucket kb-prod \
  --description "Click + purchase events" \
  --columns-json '[
    {"name":"id",          "type":"COLUMN_TYPE_LONG",      "nullable":false},
    {"name":"user_id",     "type":"COLUMN_TYPE_STRING",    "nullable":false},
    {"name":"occurred_at", "type":"COLUMN_TYPE_TIMESTAMP", "nullable":false},
    {"name":"event_type",  "type":"COLUMN_TYPE_STRING",    "nullable":false},
    {"name":"payload",     "type":"COLUMN_TYPE_JSON",      "nullable":true}
  ]' \
  --partition-column event_type \
  --merge-key id --merge-key user_id

The two key choices:

Choice	Effect
Composite PK (`--merge-key id --merge-key user_id`)	Writes whose predicate matches both columns route through the write log → drain → Delta MERGE. Without a PK, writes go direct to Delta — fast but no read-your-writes via `freshness=STRONG`.
Partition column (`event_type`)	Delta physically groups rows by this column. Reads that filter on it can prune entire partitions; large partitioned tables get `FEDERATED_*` read strategies.

Verify the table is ACTIVE:


dodil k3 table describe events --bucket kb-prod -o json \
  | jq '{name, status, columns: [.columns[] | {name, type, nullable}], partitionColumns, pkColumns}'

Expect status: TABLE_STATUS_ACTIVE and pkColumns: ["id", "user_id"]. The pkColumns here is authoritative — it’s what the planner uses to decide write strategies.

2. Insert rows

Bulk insert via the typed shortcut:


dodil k3 table insert events --bucket kb-prod \
  --row '{"id":1,"user_id":"u-101","occurred_at":1716840000000000,"event_type":"click",   "payload":{"page":"/pricing"}}' \
  --row '{"id":2,"user_id":"u-101","occurred_at":1716840060000000,"event_type":"click",   "payload":{"page":"/signup"}}' \
  --row '{"id":3,"user_id":"u-102","occurred_at":1716840120000000,"event_type":"purchase","payload":{"sku":"A-12","amount":49.99}}'

Sample response:


{
  "rowsWritten": "3",
  "version": "0",
  "pendingDrain": true
}

rowsWritten: 3 — three write-log appends accepted.
version: 0 — Delta unchanged until the compactor drains; 0 because no Delta commit has happened from these writes yet.
pendingDrain: true — confirms keyed routing (writes landed in the log; will MERGE on next drain).

timestamp values are microseconds since epoch UTC. Alternative wire shape: "2026-05-27 10:00:00" string (same column). The CLI’s --row is a literal JSON object; either works.

3. Query — eventual freshness (default)


dodil k3 table query \
  "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC" \
  --bucket kb-prod

If you run this immediately after step 2 — you may see zero rows. That’s FRESHNESS_EVENTUAL (the default) reading Delta only; the writes are still in the log waiting for the compactor.

You have three options to see the rows:

Wait for automatic compaction (background job — happens periodically).

Force a compaction explicitly:


dodil k3 table compact events --bucket kb-prod

Switch to strong freshness for this query — covered in step 4.

4. Query — strong freshness (read-your-writes)

The CLI doesn’t yet have a --freshness flag, so use the API directly for FRESHNESS_STRONG:


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "sql": "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC",
    "freshness": "FRESHNESS_STRONG"
  }' | jq '.query | {strategy, servedBy, rowCount, rows}'

Expect:


{
  "strategy": "QUERY_STRATEGY_UNARY_MERGED_STRONG",
  "servedBy": "SERVED_BY_MERGED",
  "rowCount": "2",
  "rows": [
    "[\"click\", 2]",
    "[\"purchase\", 1]"
  ]
}

servedBy: SERVED_BY_MERGED confirms the query merged the write log with Delta. Pays a small overhead vs EVENTUAL — use sparingly on hot paths, freely for “did my write land?” checks.

5. Upsert with `MERGE`

Replace an existing row + add a new one in one call:


dodil k3 table merge events --bucket kb-prod \
  --match-column id --match-column user_id \
  --row '{"id":1,"user_id":"u-101","occurred_at":1716840000000000,"event_type":"click_pricing","payload":{"page":"/pricing","variant":"B"}}' \
  --row '{"id":4,"user_id":"u-103","occurred_at":1716840180000000,"event_type":"signup","payload":{"plan":"pro"}}'

Sample response:


{
  "rowsWritten": "2",
  "version": "0",
  "pendingDrain": true
}

rowsWritten: 2 counts source rows accepted into the log — regardless of whether each row will ultimately MATCH (id=1) or NOT MATCH (id=4) on drain. The post-drain breakdown (insert/update/delete) emerges via dodil k3 table describe’s lastDrain* counters.

Verify via strong-freshness query:


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "sql": "SELECT id, user_id, event_type FROM events WHERE user_id IN ('\''u-101'\'', '\''u-103'\'') ORDER BY id",
    "freshness": "FRESHNESS_STRONG"
  }' | jq '.query.rows'

You should see id=1 with event_type=click_pricing (updated), and id=4 with signup (inserted).

6. Update + delete (keyed)

Update via PK (predicate matches merge_keys):


dodil k3 table update events --bucket kb-prod \
  --predicate "id = 2 AND user_id = 'u-101'" \
  --updates-json '{"event_type":"click_signup"}'

Routes through the write log (KEYED_UPDATE strategy). Same pendingDrain: true semantics.

Delete via PK:


dodil k3 table delete-rows events --bucket kb-prod \
  --predicate "id = 3 AND user_id = 'u-102'"

Writes a tombstone into the log. Compactor applies the Delta DELETE on next drain.

7. Maintenance — compact + optimize

After bulk writes, drain the log to Delta and pack the files:


# Step 1 — drain the write log into Delta
while dodil k3 table compact events --bucket kb-prod -o json \
        | jq -e '.truncated == true' > /dev/null; do
  echo "  more entries to drain..."
done
 
# Step 2 — bin-pack small Delta files
dodil k3 table optimize events --bucket kb-prod -o json \
  | jq '{filesRemoved, filesAdded, bytesRemoved, bytesAdded, partitionsOptimized}'

You don’t usually need to run these manually — K3 maintains the drain in the background — but explicit calls are useful:

After large bulk writes when you want rows visible to EVENTUAL reads immediately
As the Compact → Optimize canonical post-batch sequence
In tests / e2e to assert behavior against describe’s drain counters

8. Describe to see drain stats

describe is the single richest read on a table — schema, Delta location + version, write-log backlog, drain-lag SLO, last-drain row classification:


dodil k3 table describe events --bucket kb-prod -o json \
  | jq '{
      tableName, version,
      walObjectCount, walTotalBytes, drainLagSecs,
      lastDrainInsertedRows, lastDrainUpdatedRows, lastDrainDeletedRows
    }'

After step 7’s compact loop, expect:


{
  "tableName": "events",
  "version": "5",
  "walObjectCount": "0",
  "walTotalBytes": "0",
  "drainLagSecs": "0",
  "lastDrainInsertedRows": "1",
  "lastDrainUpdatedRows": "3",
  "lastDrainDeletedRows": "1"
}

The lastDrain* counters are the truthful post-drain row classification — they tell you how many rows the compactor actually inserted/updated/deleted on the last drain. This is what rowsWritten can’t synchronously surface (the pre-drain log doesn’t yet know match-vs-no-match).

Cleanup


dodil k3 table delete events --bucket kb-prod

Common gotchas

Symptom	Cause	Fix
Insert returns `pendingDrain: false` and you expected `true`	Table has no `merge_keys` — writes go direct to Delta	Confirm via `dodil k3 table describe` → `pkColumns` populated; recreate with `--merge-key` if needed
`EVENTUAL` query returns 0 rows but you just inserted	Writes are in the write log, not yet drained	Use `FRESHNESS_STRONG` (API) or run `compact`
`MERGE` reports `rowsWritten: N` but `describe.lastDrainInsertedRows` is different	Pre-drain `rowsWritten` is source-row count; post-drain insert/update split emerges after the compactor MERGE	Run `compact`, then re-read `describe` — counters realign
`UPDATE` with non-PK predicate succeeds but doesn’t appear in `EVENTUAL` reads	`NON_KEYED_UPDATE` writes directly to Delta — should appear immediately	If still missing: log entries for the same PKs may have un-drained writes that overwrote on next drain. Run `compact` first then re-run the UPDATE. See Execute → UPDATE for the safe pattern.
`optimize` shows `partitionsOptimized: 0`	Table was already well-packed (typical for routinely-maintained HTAP tables)	Not an error — no-op is expected
`compact` keeps returning `truncated: true`	Log has more than `--batch-size` entries	Loop until `truncated: false` (see step 7)

Variations

Variation	What changes
No-PK table (audit log style)	Skip `--merge-key`; writes go direct to Delta (`NON_KEYED_INSERT`). No upserts, no `FRESHNESS_STRONG` read-your-writes — but faster writes for append-only workloads.
Multi-column partition	Pass `--partition-column` multiple times. Reads that filter on all partition columns prune most aggressively.
JSON-heavy schema	Reduce typed columns; lean on `payload` JSON. Trade-off: JSON ops (`->>`, `json_extract`) are slower than typed-column predicates. Promote hot JSON fields to typed columns when you query them often.
Wider PK	More `--merge-key` flags = composite PK. Predicates must match all PK columns for keyed routing — partial-PK predicates fall to non-keyed.
Default expressions	Add `"defaultExpression": "GENERATE_UUID()"` (or `"CURRENT_TIMESTAMP()"`, `"ULID()"`, literal `"42"`, etc.) to a column in `--columns-json` — applied at row commit when the column is absent. See SQL Compatibility → default_expression.