SQL Compatibility

K3 Tables speak DuckDB-flavored SQL. Anything DuckDB parses — joins, CTEs, window functions, JSON operators, set ops, subqueries — K3 can plan and dispatch. There are a few documented refusals (transactions, multi-statement), and the planner picks a strategy per statement based on the table’s primary key, partitioning, and the query shape.

This page is the SQL contract: what dialect, what types, what statement shapes are supported, what’s not, and how to connect a native client.

Dialect


Base SQL dialect	DuckDB SQL
Driven by	`Execute` RPC (`POST /:bucket/tables/_execute`) — sends one SQL string, returns structured results
Where DuckDB documents the dialect	duckdb.org/docs/sql
K3-specific divergences	See Refusals below — the rest is DuckDB

Why DuckDB: it’s a single, well-documented, embeddable SQL parser + planner with strong analytical-workload focus. K3 hands DuckDB the SQL, gets a parsed plan back, then routes the operation across the write log + Delta tiers via its own dispatcher. You write standard DuckDB SQL; K3 worries about the tier routing.

Column types

Eleven column types — all are real DuckDB types under the hood, all serialize losslessly to Delta:

K3 type	DuckDB type	Wire shape on `Insert` / `Merge`
`string`	`VARCHAR`	JSON string
`int`	`INTEGER` (int32)	JSON number
`long`	`BIGINT` (int64)	JSON number or string (large values)
`short`	`SMALLINT` (int16)	JSON number
`float`	`FLOAT` (32-bit)	JSON number
`double`	`DOUBLE` (64-bit)	JSON number
`boolean`	`BOOLEAN`	JSON boolean
`date`	`DATE`	JSON string `"YYYY-MM-DD"`
`timestamp`	`TIMESTAMP` (microseconds UTC)	JSON number (µs since epoch) or `"YYYY-MM-DD HH:MM:SS"`
`binary`	`BLOB`	base64 string
`json`	`VARCHAR` (with Arrow JSON extension tag)	JSON value (object / array / scalar)

long values that exceed safe-integer range (±2^53 − 1) — pass as a JSON string to preserve precision. K3 stores int64 natively.

`default_expression`

Every column may carry a default expression — literal SQL applied at row-commit time when the column is absent from the input row. Validated at table creation:

Default expression	Effect
`"GENERATE_UUID()"`	UUID string per row
`"CURRENT_TIMESTAMP()"`	Insert-time timestamp
`"ULID()"`	Lexicographically-sortable ID
`"42"` / `"'pending'"` / `"true"`	Literal constants
`""` (empty)	Column required at write time unless `nullable = true`

Statement shapes — supported

What you can put in an Execute SQL string:

DDL


-- Create a table with a composite primary key + partition column + JSON column
CREATE TABLE events (
  id         BIGINT NOT NULL,
  user_id    VARCHAR NOT NULL,
  occurred_at TIMESTAMP NOT NULL,
  event_type VARCHAR NOT NULL,
  payload    JSON,
  PRIMARY KEY (id, user_id)
) PARTITIONED BY (event_type);
 
-- Add a new column
ALTER TABLE events ADD COLUMN session_id VARCHAR;
 
-- Drop a table
DROP TABLE events;
DROP TABLE IF EXISTS events;
 
-- CTAS (CREATE TABLE AS SELECT) — schema inferred from the SELECT
CREATE TABLE click_summary AS
  SELECT user_id, COUNT(*) AS n
  FROM events
  WHERE event_type = 'click'
  GROUP BY user_id;

Notes:

PRIMARY KEY is honored — inline column-level (id BIGINT PRIMARY KEY) and table-level (PRIMARY KEY (id, user_id)) both work. The key flows to the planner so subsequent writes get keyed routing for free.
PARTITIONED BY (col, …) controls Delta partitioning. Reads that filter on a partition column can prune entire partitions.
CTAS infers types from the SELECT — int-shaped JSON numbers → long, fractional → double, strings → string, bools → boolean. For typed migrations, pass explicit columns instead via the structured CreateTable RPC.

DML — writes


-- Single-row insert (keyed)
INSERT INTO events (id, user_id, occurred_at, event_type, payload)
  VALUES (1, 'u-101', TIMESTAMP '2026-05-27 10:00:00', 'click', '{"page":"/pricing"}');
 
-- Bulk insert (keyed)
INSERT INTO events (id, user_id, occurred_at, event_type, payload) VALUES
  (1, 'u-101', TIMESTAMP '2026-05-27 10:00:00', 'click',    '{"page":"/pricing"}'),
  (2, 'u-101', TIMESTAMP '2026-05-27 10:01:00', 'click',    '{"page":"/signup"}'),
  (3, 'u-102', TIMESTAMP '2026-05-27 10:02:00', 'purchase', '{"sku":"A-12","amount":49.99}');
 
-- INSERT … SELECT (keyed-from-select)
INSERT INTO click_summary
  SELECT user_id, COUNT(*) FROM events WHERE event_type = 'click' GROUP BY user_id;
 
-- UPDATE via PK (keyed → write log)
UPDATE events SET event_type = 'click_pricing'
  WHERE id = 1 AND user_id = 'u-101';
 
-- UPDATE via non-PK predicate (non-keyed → ⚠️ Delta-only, see warnings)
UPDATE events SET event_type = 'archived' WHERE occurred_at < TIMESTAMP '2025-01-01 00:00:00';
 
-- DELETE via PK (keyed)
DELETE FROM events WHERE id = 1 AND user_id = 'u-101';
 
-- DELETE via non-PK predicate (non-keyed → ⚠️)
DELETE FROM events WHERE occurred_at < TIMESTAMP '2025-01-01 00:00:00';
 
-- MERGE (upsert) — source is inline rows / a query / another table
MERGE INTO events AS t
USING (SELECT * FROM events_staging) AS s
ON t.id = s.id AND t.user_id = s.user_id
WHEN MATCHED THEN UPDATE SET event_type = s.event_type, payload = s.payload
WHEN NOT MATCHED THEN INSERT (id, user_id, occurred_at, event_type, payload)
  VALUES (s.id, s.user_id, s.occurred_at, s.event_type, s.payload);

Routing rule of thumb: predicates / ON clauses that match the table’s merge_keys route through the write log (the planner picks a KEYED_* strategy). Predicates that don’t route directly to Delta — fast but bypass the log, which carries the WAL-overwrite risk discussed in Execute → UPDATE.

DML — reads


-- Simple SELECT
SELECT * FROM events WHERE event_type = 'click' LIMIT 100;
 
-- Aggregates (federated across partitions when applicable)
SELECT user_id, COUNT(*) AS n
FROM events
GROUP BY user_id
ORDER BY n DESC;
 
-- Joins
SELECT e.user_id, u.email, COUNT(*) AS clicks
FROM events e JOIN users u ON e.user_id = u.id
WHERE e.event_type = 'click'
GROUP BY e.user_id, u.email;
 
-- Window functions
SELECT user_id, occurred_at,
       LAG(occurred_at) OVER (PARTITION BY user_id ORDER BY occurred_at) AS prev_event
FROM events;
 
-- CTEs (WITH …)
WITH purchases AS (
  SELECT user_id, payload->>'sku' AS sku, (payload->>'amount')::double AS amount
  FROM events WHERE event_type = 'purchase'
)
SELECT sku, SUM(amount) AS total FROM purchases GROUP BY sku;
 
-- Subqueries
SELECT * FROM events WHERE user_id IN (SELECT id FROM users WHERE tier = 'pro');
 
-- Set ops
(SELECT user_id FROM events WHERE event_type = 'click')
INTERSECT
(SELECT user_id FROM events WHERE event_type = 'purchase');

All standard DuckDB read shapes work. The planner picks UNARY_WAREHOUSE, UNARY_MERGED_STRONG, or FEDERATED_* depending on freshness, partitioning, and aggregate-vs-scan shape — see Execute → SELECT.

Read freshness

For SELECT plans, you control freshness via the freshness field on ExecuteRequest — not in the SQL itself:

Mode	What it does	When to use
`EVENTUAL` (default)	Delta-only scan; planner may distribute across partitions	Analytical reads; OLAP workloads. Fast but misses uncompacted writes.
`STRONG`	Single-pod merge of write log + Delta	Read-your-writes. Pays log-listing cost; scope tightly.
`OLTP_ONLY`	Write log only (no Delta scan)	Niche — point lookups during heavy un-drained backlog. Most callers want `STRONG`.

JSON columns

K3’s json column is a first-class semantic type. Stored as canonical JSON text under the hood, tagged so DuckDB recognizes it. Every DuckDB JSON operator works:


-- Field access
SELECT payload->'sku' FROM events;            -- returns JSON
SELECT payload->>'sku' FROM events;           -- returns TEXT
 
-- JSON path
SELECT json_extract(payload, '$.amount') FROM events;
SELECT json_extract_string(payload, '$.sku') FROM events;
 
-- Iterate over array elements / object entries
SELECT json_each(payload) FROM events;
 
-- Cast extracted scalars
SELECT (payload->>'amount')::double AS amount FROM events;
 
-- Round-trip a Postgres-style cast
SELECT payload::json->'tags'->>0 FROM events;

For high-volume read paths, promote hot fields to typed columns; reserve json for the long tail / unstructured metadata.

Refusals

The following are valid DuckDB SQL but K3 refuses them:

Refused	Why	What to do instead
`BEGIN` / `COMMIT` / `ROLLBACK`	No transaction coordinator in K3 today	For transactional workloads requiring multi-statement ACID, use Postgres. Individual `Execute` calls are atomic against Delta.
Multi-statement bodies (`stmt1; stmt2;`)	One SQL per `Execute` call	Chain on the caller side — send one statement at a time.
`ALTER TABLE DROP COLUMN`	Not implemented (parsed, then refused at planning)	Recreate the table without the column (via `CREATE TABLE AS SELECT`) or leave the column nullable.
`ALTER TABLE RENAME COLUMN`	Not implemented (parsed, then refused)	Same as above — recreate via CTAS with the new name.

Multiple ADD COLUMNs in a single ALTER TABLE are supported.

Native clients

Two ways to send SQL to K3 today — both speak the same dialect:

HTTP — `Execute` RPC


curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \
  -H "Authorization: Bearer $DODIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "kb-prod",
    "sql": "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type",
    "freshness": "FRESHNESS_EVENTUAL"
  }'

Response is a oneof — query / write / ddl — depending on the SQL kind. See API Reference → Execute.

CLI — `dodil k3 table query`


# Default freshness (eventual)
dodil k3 table query --bucket kb-prod --sql "
  SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type
"
 
# Read-your-writes
dodil k3 table query --bucket kb-prod --sql "
  SELECT * FROM events WHERE user_id = 'u-101'
" --freshness strong
 
# JSON output for scripting
dodil k3 table query --bucket kb-prod --sql "SELECT * FROM events LIMIT 5" -o json

The CLI’s table query command handles both reads and writes — it picks the right RPC based on what the planner returns.

Structured shortcuts (no SQL string)

If you’d rather not assemble SQL strings, the typed RPCs (Insert / Merge / Update / DeleteRows / Query) and their CLI commands (dodil k3 table insert / merge / update / delete-rows / query) accept JSON rows + predicates / match-columns. The planner routes them through the same strategies — they’re just shortcuts for callers that prefer typed inputs to SQL templating. See API Reference → Data.

Coming-from-X cheatsheet

Coming from	K3 equivalent
Postgres `INSERT ... ON CONFLICT (pk) DO UPDATE`	Use `MERGE INTO ... USING ... ON t.pk = s.pk WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ...`
Postgres transactions	Not supported — wrap retry / idempotency in your app layer
MySQL `INSERT ... ON DUPLICATE KEY UPDATE`	Same — use `MERGE`
BigQuery `CREATE TABLE AS SELECT`	Same — `CREATE TABLE … AS SELECT …`
BigQuery `MERGE`	Identical syntax
DuckDB itself	Identical syntax; K3 adds the HTAP routing layer
Spark SQL / Iceberg `MERGE`	Similar — `MERGE INTO ... USING ... WHEN MATCHED ... WHEN NOT MATCHED ...`
Delta Lake external tools	Tables are real Delta tables — external Delta-capable engines can read them by pointing at the bucket’s Delta directory. Writes are the planner’s job; don’t write to the Delta dir externally.