Core Concepts — Tables
Every type signature below is verbatim from proto/proto-k3/k3_tables.proto. Package: dodil.k3.tables.v1. Wire encodings: gRPC follows proto types directly; HTTP uses pbjson (camelCase, int64 as JSON strings, enums as wire-name strings).
The six types: Engine, Table, Column, QueryStrategy / Freshness / ServedBy (read-side), WriteStrategy (write-side), HistoryEntry (time travel). How they relate:
bucket ──┬── Engine (auto-enabled per bucket; 1:1)
│
└── Tables ─── Columns (incl. JSON, partitioning, PK)
│
├── Writes ─► WriteStrategy (KeyedInsert / Merge / NonKeyed…)
│
├── Reads ─► QueryStrategy (UnaryWarehouse / UnaryMergedStrong / Federated…)
│ + Freshness (Eventual / Strong / OltpOnly)
│
└── History ── time-travel via RestoreThe two-layer HTAP model in plain terms: writes append to a transactional write log inside the bucket; an asynchronous compactor drains the log into the table’s Delta Lake commit history. Reads can scan Delta only (eventual), merge the log with Delta (strong), or hit the log only (rare). Maintenance ops (Optimize / Vacuum / Compact / Restore / History) operate on the Delta side directly.
Engine
The per-bucket table backend. Auto-enabled on CreateBucket; you usually don’t touch it.
HTTP
{
"engineId": "eng_a1b2…",
"bucket": "kb-prod",
"status": "ENGINE_STATUS_ACTIVE",
"settings": "{}",
"tableCount": 4,
"createdAt": "1716840000000",
"updatedAt": "1716843600000"
}Key facts:
- One engine per bucket.
engine_idis internal; APIs that touch the engine takebucketand look it up. - Status only flips when you explicitly call
DisableEngine/EnableEngine. The default isACTIVEfrom bucket creation. settingsis a JSON string for forward-compatible advanced knobs. You almost never set it.
Table
A structured dataset in a bucket. Has columns, partition columns, merge keys, and one of two sources — manual (you defined the schema) or pipeline-generated (a Scriptum template owns the schema).
HTTP
{
"tableId": "tbl_a1b2…",
"bucket": "kb-prod",
"engineId": "eng_a1b2…",
"name": "events",
"description": "Click + purchase events",
"columns": [
{ "name": "id", "type": "COLUMN_TYPE_LONG", "nullable": false },
{ "name": "user_id", "type": "COLUMN_TYPE_STRING", "nullable": false },
{ "name": "occurred_at", "type": "COLUMN_TYPE_TIMESTAMP", "nullable": false },
{ "name": "event_type", "type": "COLUMN_TYPE_STRING", "nullable": false },
{ "name": "payload", "type": "COLUMN_TYPE_JSON", "nullable": true }
],
"partitionColumns": ["event_type"],
"mergeKeys": ["id", "user_id"],
"source": "TABLE_SOURCE_MANUAL",
"deltaVersion": "42",
"rowCount": "1247",
"sizeBytes": "8923471",
"status": "TABLE_STATUS_ACTIVE",
"createdAt": "1716840000000",
"updatedAt": "1716843600000"
}Two creation modes
| Manual | Pipeline-generated | |
|---|---|---|
| RPC | CreateTable | CreateTablePipeline |
| Who owns the schema | You — pass columns explicitly | The Scriptum template — schema materializes lazily on first ingest |
source | TABLE_SOURCE_MANUAL | TABLE_SOURCE_PIPELINE_GENERATED |
pipeline_id populated | no | yes — auto-bound to the template’s pipeline |
| Auto-generated rule | no | yes — globs derived from the template’s accepted_extensions |
| When to pick | Structured data + SQL-first workflow | Unstructured documents → auto-extracted rows |
Other key facts:
merge_keysis the table’s primary key (composite supported). Drives the write planner — writes with predicates that matchmerge_keysroute through the keyed strategies (WAL → drain MERGE); other predicates route throughnon_keyedstrategies (warehouse-only — see WriteStrategy).partition_columnscontrols physical layout in Delta. Reads that filter on a partition column can prune entire directories.delta_versionadvances on every commit (writes, optimize, vacuum, restore). Use it as the cursor forHistorypagination.row_count/size_bytesare sidecar stats — accurate after the last drain, slightly stale during heavy writes.
Column
message Column {
string name = 1;
ColumnType type = 2;
bool nullable = 3;
// BigQuery-aligned. Empty = no default.
// Examples: "GENERATE_UUID()", "CURRENT_TIMESTAMP()", "ULID()", "42", "'pending'"
string default_expression = 4;
}
enum ColumnType {
COLUMN_TYPE_UNSPECIFIED = 0;
COLUMN_TYPE_STRING = 1; // utf-8 string
COLUMN_TYPE_INT = 2; // int32
COLUMN_TYPE_LONG = 3; // int64
COLUMN_TYPE_SHORT = 4; // int16
COLUMN_TYPE_FLOAT = 5; // float32
COLUMN_TYPE_DOUBLE = 6; // float64
COLUMN_TYPE_BOOLEAN = 7;
COLUMN_TYPE_DATE = 8; // YYYY-MM-DD
COLUMN_TYPE_TIMESTAMP = 9; // microseconds since epoch UTC
COLUMN_TYPE_BINARY = 10; // bytes
COLUMN_TYPE_JSON = 11; // arbitrary JSON, DuckDB-queryable
}Key facts:
- JSON columns are first-class — DuckDB’s native JSON ops (
->,->>,json_extract,json_each,from_json) all work directly on them. For high-volume read paths, promote hot fields to typed columns; reservejsonfor long-tail / metadata. default_expressionis BigQuery-aligned SQL text applied at row commit when the column is absent from the input row. Validated against the column type at table creation.- Empty
default_expressionmeans the column is required at write time unlessnullable = true(in which case absence yields SQLNULL).
Read-side observability — QueryStrategy / Freshness / ServedBy
When you run a SELECT through Execute, the response tells you exactly which strategy the planner picked and which tier(s) answered:
// Freshness selector on the request — picks the read path.
enum Freshness {
FRESHNESS_UNSPECIFIED = 0; // treated as EVENTUAL
FRESHNESS_EVENTUAL = 1; // Delta only (default — OLAP-optimized)
FRESHNESS_STRONG = 2; // write log + Delta merged (read-your-writes)
FRESHNESS_OLTP_ONLY = 3; // write log only (niche; point lookups during heavy backlog)
}
// Strategy chosen by the planner (returned on the response).
enum QueryStrategy {
QUERY_STRATEGY_UNSPECIFIED = 0;
QUERY_STRATEGY_UNARY_WAREHOUSE = 1; // single Delta scan
QUERY_STRATEGY_UNARY_MERGED_STRONG = 2; // single-pod write-log + Delta merge
QUERY_STRATEGY_FEDERATED_AGGREGATE = 3; // partitions fanned out, results reduced
QUERY_STRATEGY_FEDERATED_SCAN = 4; // partitions fanned out, results concatenated
QUERY_STRATEGY_FEDERATED_TO_SINGLE = 5; // planner downgraded to single (small table, no partitions)
}
// Which tier(s) actually answered.
enum ServedBy {
SERVED_BY_UNSPECIFIED = 0;
SERVED_BY_WAREHOUSE = 1; // Delta only
SERVED_BY_MERGED = 2; // write log + Delta
SERVED_BY_OLTP = 3; // write log only
}How they interact:
| You request… | Planner picks… | Tier… |
|---|---|---|
Default (EVENTUAL) | UNARY_WAREHOUSE or FEDERATED_* depending on partitioning + size | WAREHOUSE |
STRONG | UNARY_MERGED_STRONG | MERGED |
OLTP_ONLY | (single-pod write-log scan) | OLTP |
| Large partitioned table + aggregate query | FEDERATED_AGGREGATE | WAREHOUSE (fanned-out) |
| Large partitioned table + non-aggregate SELECT | FEDERATED_SCAN | WAREHOUSE (fanned-out + concatenated) |
| Federated plan but table too small / unpartitioned | FEDERATED_TO_SINGLE (strategy_reason explains) | WAREHOUSE |
The strategy + tier come back on ExecuteQueryResult.strategy / served_by for every read, alongside an optional human-readable strategy_reason for downgrades.
Write-side observability — WriteStrategy
The mirror enum for writes — every Insert / Merge / Update / DeleteRows / Execute(SQL) write returns the strategy the planner picked:
enum WriteStrategy {
WRITE_STRATEGY_UNSPECIFIED = 0;
// ── PK present + WHERE matches the PK ── (write-log → drain MERGE)
WRITE_STRATEGY_KEYED_INSERT_SINGLE = 1;
WRITE_STRATEGY_KEYED_INSERT_BULK = 2;
WRITE_STRATEGY_KEYED_INSERT_FROM_SELECT = 3;
WRITE_STRATEGY_KEYED_UPDATE = 4;
WRITE_STRATEGY_KEYED_RANGE = 5;
WRITE_STRATEGY_KEYED_FROM_SUBQUERY = 6;
WRITE_STRATEGY_KEYED_DELETE = 7;
WRITE_STRATEGY_KEYED_RANGE_DELETE = 8;
WRITE_STRATEGY_KEYED_DELETE_FROM_SUBQUERY = 9;
// ── No PK or WHERE doesn't match PK ── (Delta-only — ⚠️ WAL-bypass)
WRITE_STRATEGY_NON_KEYED_UPDATE = 10;
WRITE_STRATEGY_NON_KEYED_DELETE = 11;
WRITE_STRATEGY_NON_KEYED_INSERT = 15;
WRITE_STRATEGY_NON_KEYED_INSERT_FROM_SELECT = 16;
// ── MERGE ──
WRITE_STRATEGY_MERGE_ROWS = 12;
WRITE_STRATEGY_MERGE_QUERY = 13;
WRITE_STRATEGY_MERGE_TABLE = 14;
}Key facts:
- Keyed strategies route through the write log → the compactor drains them into Delta on the next tick. Predicates must match the table’s
merge_keys. - Non-keyed strategies bypass the write log and write directly to Delta. Faster for one-shot writes but carry a risk: if pending log entries for the same rows haven’t compacted yet, the next drain may overwrite the Delta-side change. Use after
Compact, or restrict to tables / paths where you control timing. - MERGE strategies distinguish source shape: explicit rows (
MERGE_ROWS), source query (MERGE_QUERY), or source table (MERGE_TABLE). - The strategy is returned on every
Execute(write SQL)response — useful for confirming the planner picked the cheap path.
For the full SQL-shape → strategy mapping, see Execute → Update (keyed/non-keyed UPDATE) or any of the Execute sub-pages.
HistoryEntry
The Delta Lake commit log. Every write, optimize, vacuum, restore, schema change produces one entry. Pagination cursor is version — strictly increasing.
message HistoryEntry {
int64 version = 5;
string timestamp = 1; // epoch-milliseconds (string)
string operation = 2; // "WRITE" | "MERGE" | "UPDATE" | "DELETE" |
// "OPTIMIZE" | "VACUUM START" | "VACUUM END" |
// "RESTORE" | "CREATE TABLE" | …
google.protobuf.Struct parameters = 3; // op-specific (e.g. {"predicate":"id = 4"})
google.protobuf.Struct operation_metrics = 6;
google.protobuf.Struct user_metadata = 7;
string engine_info = 8;
int64 read_version = 4; // version this op read from; 0 on initial CREATE
}Key facts:
versionis the cursor — useHistoryRequest.before_versionto page backwards.timestampis epoch-ms-as-string; order byversion, not timestamp — clock skew across writers can cause tiny non-monotonicity.operation_metricscarries the Delta-native counters per op (numTargetRowsInserted/Updated/Deletedfor MERGE/UPDATE;numFilesAdded/Removedfor OPTIMIZE; …).user_metadatais whatever you attached at commit time viacommitInfo.userMetadata. Empty when not set.- Time travel via
Restoretakes aversionfrom this log — see Recipes → Time travel + restore.
When ingest runs (pipeline-bound tables)
Pipeline-generated tables (CreateTablePipeline) ingest automatically — same trigger model as Pipelines:
- Direct upload of a matching object via S3 → Scriptum template runs → rows land in the table
- Source sync (Preview) → discovered objects → template runs → rows land
- One-shot manual via
TriggerIngest→ forces a single object through the bound pipeline
The auto-generated ingest rule (created with CreateTablePipeline) is a normal IngestRule — list / inspect / edit it via the Pipelines APIs.
See also
- Quickstart — see all six types in 5 minutes
- SQL Compatibility — DuckDB dialect + statement shapes
- API Reference → Execute — full strategy table + SQL examples
- API Reference → Tables — both creation modes
- Pipelines → Templates → The catalog — warehouse-compatible Scriptum templates for
CreateTablePipeline