Manual Table
Goal: build a structured table from scratch — schema you control, data you push, queries you write — then keep it healthy with periodic maintenance. The canonical SQL-first workflow.
Shape:
CreateTable (manual) → Insert / Merge / Update / DeleteRows
│
┌──────────┼──────────┐
▼ ▼ ▼
Query Compact Optimize
(eventual (drain log (bin-pack
+ strong) → Delta) files)Prerequisites
dodilCLI installed anddodil logindone — CLI Basics.- A bucket —
kb-prod:dodil k3 bucket create kb-prod -d "Events warehouse" - Engine is already enabled — every bucket gets one on
CreateBucket.
1. Create the table
We’ll model an events table with a composite primary key, a partition column, and a JSON column for arbitrary payload:
dodil k3 table create events --bucket kb-prod \
--description "Click + purchase events" \
--columns-json '[
{"name":"id", "type":"COLUMN_TYPE_LONG", "nullable":false},
{"name":"user_id", "type":"COLUMN_TYPE_STRING", "nullable":false},
{"name":"occurred_at", "type":"COLUMN_TYPE_TIMESTAMP", "nullable":false},
{"name":"event_type", "type":"COLUMN_TYPE_STRING", "nullable":false},
{"name":"payload", "type":"COLUMN_TYPE_JSON", "nullable":true}
]' \
--partition-column event_type \
--merge-key id --merge-key user_idThe two key choices:
| Choice | Effect |
|---|---|
Composite PK (--merge-key id --merge-key user_id) | Writes whose predicate matches both columns route through the write log → drain → Delta MERGE. Without a PK, writes go direct to Delta — fast but no read-your-writes via freshness=STRONG. |
Partition column (event_type) | Delta physically groups rows by this column. Reads that filter on it can prune entire partitions; large partitioned tables get FEDERATED_* read strategies. |
Verify the table is ACTIVE:
dodil k3 table describe events --bucket kb-prod -o json \
| jq '{name, status, columns: [.columns[] | {name, type, nullable}], partitionColumns, pkColumns}'Expect status: TABLE_STATUS_ACTIVE and pkColumns: ["id", "user_id"]. The pkColumns here is authoritative — it’s what the planner uses to decide write strategies.
2. Insert rows
Bulk insert via the typed shortcut:
dodil k3 table insert events --bucket kb-prod \
--row '{"id":1,"user_id":"u-101","occurred_at":1716840000000000,"event_type":"click", "payload":{"page":"/pricing"}}' \
--row '{"id":2,"user_id":"u-101","occurred_at":1716840060000000,"event_type":"click", "payload":{"page":"/signup"}}' \
--row '{"id":3,"user_id":"u-102","occurred_at":1716840120000000,"event_type":"purchase","payload":{"sku":"A-12","amount":49.99}}'Sample response:
{
"rowsWritten": "3",
"version": "0",
"pendingDrain": true
}rowsWritten: 3— three write-log appends accepted.version: 0— Delta unchanged until the compactor drains;0because no Delta commit has happened from these writes yet.pendingDrain: true— confirms keyed routing (writes landed in the log; will MERGE on next drain).
timestampvalues are microseconds since epoch UTC. Alternative wire shape:"2026-05-27 10:00:00"string (same column). The CLI’s--rowis a literal JSON object; either works.
3. Query — eventual freshness (default)
dodil k3 table query \
"SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC" \
--bucket kb-prodIf you run this immediately after step 2 — you may see zero rows. That’s FRESHNESS_EVENTUAL (the default) reading Delta only; the writes are still in the log waiting for the compactor.
You have three options to see the rows:
- Wait for automatic compaction (background job — happens periodically).
- Force a compaction explicitly:
dodil k3 table compact events --bucket kb-prod - Switch to strong freshness for this query — covered in step 4.
4. Query — strong freshness (read-your-writes)
The CLI doesn’t yet have a --freshness flag, so use the API directly for FRESHNESS_STRONG:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"sql": "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC",
"freshness": "FRESHNESS_STRONG"
}' | jq '.query | {strategy, servedBy, rowCount, rows}'Expect:
{
"strategy": "QUERY_STRATEGY_UNARY_MERGED_STRONG",
"servedBy": "SERVED_BY_MERGED",
"rowCount": "2",
"rows": [
"[\"click\", 2]",
"[\"purchase\", 1]"
]
}servedBy: SERVED_BY_MERGED confirms the query merged the write log with Delta. Pays a small overhead vs EVENTUAL — use sparingly on hot paths, freely for “did my write land?” checks.
5. Upsert with MERGE
Replace an existing row + add a new one in one call:
dodil k3 table merge events --bucket kb-prod \
--match-column id --match-column user_id \
--row '{"id":1,"user_id":"u-101","occurred_at":1716840000000000,"event_type":"click_pricing","payload":{"page":"/pricing","variant":"B"}}' \
--row '{"id":4,"user_id":"u-103","occurred_at":1716840180000000,"event_type":"signup","payload":{"plan":"pro"}}'Sample response:
{
"rowsWritten": "2",
"version": "0",
"pendingDrain": true
}rowsWritten: 2 counts source rows accepted into the log — regardless of whether each row will ultimately MATCH (id=1) or NOT MATCH (id=4) on drain. The post-drain breakdown (insert/update/delete) emerges via dodil k3 table describe’s lastDrain* counters.
Verify via strong-freshness query:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"sql": "SELECT id, user_id, event_type FROM events WHERE user_id IN ('\''u-101'\'', '\''u-103'\'') ORDER BY id",
"freshness": "FRESHNESS_STRONG"
}' | jq '.query.rows'You should see id=1 with event_type=click_pricing (updated), and id=4 with signup (inserted).
6. Update + delete (keyed)
Update via PK (predicate matches merge_keys):
dodil k3 table update events --bucket kb-prod \
--predicate "id = 2 AND user_id = 'u-101'" \
--updates-json '{"event_type":"click_signup"}'Routes through the write log (KEYED_UPDATE strategy). Same pendingDrain: true semantics.
Delete via PK:
dodil k3 table delete-rows events --bucket kb-prod \
--predicate "id = 3 AND user_id = 'u-102'"Writes a tombstone into the log. Compactor applies the Delta DELETE on next drain.
7. Maintenance — compact + optimize
After bulk writes, drain the log to Delta and pack the files:
# Step 1 — drain the write log into Delta
while dodil k3 table compact events --bucket kb-prod -o json \
| jq -e '.truncated == true' > /dev/null; do
echo " more entries to drain..."
done
# Step 2 — bin-pack small Delta files
dodil k3 table optimize events --bucket kb-prod -o json \
| jq '{filesRemoved, filesAdded, bytesRemoved, bytesAdded, partitionsOptimized}'You don’t usually need to run these manually — K3 maintains the drain in the background — but explicit calls are useful:
- After large bulk writes when you want rows visible to
EVENTUALreads immediately - As the
Compact→Optimizecanonical post-batch sequence - In tests / e2e to assert behavior against
describe’s drain counters
8. Describe to see drain stats
describe is the single richest read on a table — schema, Delta location + version, write-log backlog, drain-lag SLO, last-drain row classification:
dodil k3 table describe events --bucket kb-prod -o json \
| jq '{
tableName, version,
walObjectCount, walTotalBytes, drainLagSecs,
lastDrainInsertedRows, lastDrainUpdatedRows, lastDrainDeletedRows
}'After step 7’s compact loop, expect:
{
"tableName": "events",
"version": "5",
"walObjectCount": "0",
"walTotalBytes": "0",
"drainLagSecs": "0",
"lastDrainInsertedRows": "1",
"lastDrainUpdatedRows": "3",
"lastDrainDeletedRows": "1"
}The lastDrain* counters are the truthful post-drain row classification — they tell you how many rows the compactor actually inserted/updated/deleted on the last drain. This is what rowsWritten can’t synchronously surface (the pre-drain log doesn’t yet know match-vs-no-match).
Cleanup
dodil k3 table delete events --bucket kb-prodCommon gotchas
| Symptom | Cause | Fix |
|---|---|---|
Insert returns pendingDrain: false and you expected true | Table has no merge_keys — writes go direct to Delta | Confirm via dodil k3 table describe → pkColumns populated; recreate with --merge-key if needed |
EVENTUAL query returns 0 rows but you just inserted | Writes are in the write log, not yet drained | Use FRESHNESS_STRONG (API) or run compact |
MERGE reports rowsWritten: N but describe.lastDrainInsertedRows is different | Pre-drain rowsWritten is source-row count; post-drain insert/update split emerges after the compactor MERGE | Run compact, then re-read describe — counters realign |
UPDATE with non-PK predicate succeeds but doesn’t appear in EVENTUAL reads | NON_KEYED_UPDATE writes directly to Delta — should appear immediately | If still missing: log entries for the same PKs may have un-drained writes that overwrote on next drain. Run compact first then re-run the UPDATE. See Execute → UPDATE for the safe pattern. |
optimize shows partitionsOptimized: 0 | Table was already well-packed (typical for routinely-maintained HTAP tables) | Not an error — no-op is expected |
compact keeps returning truncated: true | Log has more than --batch-size entries | Loop until truncated: false (see step 7) |
Variations
| Variation | What changes |
|---|---|
| No-PK table (audit log style) | Skip --merge-key; writes go direct to Delta (NON_KEYED_INSERT). No upserts, no FRESHNESS_STRONG read-your-writes — but faster writes for append-only workloads. |
| Multi-column partition | Pass --partition-column multiple times. Reads that filter on all partition columns prune most aggressively. |
| JSON-heavy schema | Reduce typed columns; lean on payload JSON. Trade-off: JSON ops (->>, json_extract) are slower than typed-column predicates. Promote hot JSON fields to typed columns when you query them often. |
| Wider PK | More --merge-key flags = composite PK. Predicates must match all PK columns for keyed routing — partial-PK predicates fall to non-keyed. |
| Default expressions | Add "defaultExpression": "GENERATE_UUID()" (or "CURRENT_TIMESTAMP()", "ULID()", literal "42", etc.) to a column in --columns-json — applied at row commit when the column is absent. See SQL Compatibility → default_expression. |
See also
- Pipeline-bound Table — same primitive but Scriptum-template-bound for auto-extraction from documents
- CTAS & Materialize — derive a new table from a SELECT
- Time Travel & Restore — recover from bad writes
- Quickstart — the abbreviated version of this recipe
- SQL Compatibility — DuckDB dialect details
- Core Concepts → WriteStrategy — full routing table for keyed vs non-keyed