SQL Compatibility
K3 Tables speak DuckDB-flavored SQL. Anything DuckDB parses — joins, CTEs, window functions, JSON operators, set ops, subqueries — K3 can plan and dispatch. There are a few documented refusals (transactions, multi-statement), and the planner picks a strategy per statement based on the table’s primary key, partitioning, and the query shape.
This page is the SQL contract: what dialect, what types, what statement shapes are supported, what’s not, and how to connect a native client.
Dialect
| Base SQL dialect | DuckDB SQL |
| Driven by | Execute RPC (POST /:bucket/tables/_execute) — sends one SQL string, returns structured results |
| Where DuckDB documents the dialect | duckdb.org/docs/sql |
| K3-specific divergences | See Refusals below — the rest is DuckDB |
Why DuckDB: it’s a single, well-documented, embeddable SQL parser + planner with strong analytical-workload focus. K3 hands DuckDB the SQL, gets a parsed plan back, then routes the operation across the write log + Delta tiers via its own dispatcher. You write standard DuckDB SQL; K3 worries about the tier routing.
Column types
Eleven column types — all are real DuckDB types under the hood, all serialize losslessly to Delta:
| K3 type | DuckDB type | Wire shape on Insert / Merge |
|---|---|---|
string | VARCHAR | JSON string |
int | INTEGER (int32) | JSON number |
long | BIGINT (int64) | JSON number or string (large values) |
short | SMALLINT (int16) | JSON number |
float | FLOAT (32-bit) | JSON number |
double | DOUBLE (64-bit) | JSON number |
boolean | BOOLEAN | JSON boolean |
date | DATE | JSON string "YYYY-MM-DD" |
timestamp | TIMESTAMP (microseconds UTC) | JSON number (µs since epoch) or "YYYY-MM-DD HH:MM:SS" |
binary | BLOB | base64 string |
json | VARCHAR (with Arrow JSON extension tag) | JSON value (object / array / scalar) |
longvalues that exceed safe-integer range (±2^53 − 1) — pass as a JSON string to preserve precision. K3 stores int64 natively.
default_expression
Every column may carry a default expression — literal SQL applied at row-commit time when the column is absent from the input row. Validated at table creation:
| Default expression | Effect |
|---|---|
"GENERATE_UUID()" | UUID string per row |
"CURRENT_TIMESTAMP()" | Insert-time timestamp |
"ULID()" | Lexicographically-sortable ID |
"42" / "'pending'" / "true" | Literal constants |
"" (empty) | Column required at write time unless nullable = true |
Statement shapes — supported
What you can put in an Execute SQL string:
DDL
-- Create a table with a composite primary key + partition column + JSON column
CREATE TABLE events (
id BIGINT NOT NULL,
user_id VARCHAR NOT NULL,
occurred_at TIMESTAMP NOT NULL,
event_type VARCHAR NOT NULL,
payload JSON,
PRIMARY KEY (id, user_id)
) PARTITIONED BY (event_type);
-- Add a new column
ALTER TABLE events ADD COLUMN session_id VARCHAR;
-- Drop a table
DROP TABLE events;
DROP TABLE IF EXISTS events;
-- CTAS (CREATE TABLE AS SELECT) — schema inferred from the SELECT
CREATE TABLE click_summary AS
SELECT user_id, COUNT(*) AS n
FROM events
WHERE event_type = 'click'
GROUP BY user_id;Notes:
PRIMARY KEYis honored — inline column-level (id BIGINT PRIMARY KEY) and table-level (PRIMARY KEY (id, user_id)) both work. The key flows to the planner so subsequent writes get keyed routing for free.PARTITIONED BY (col, …)controls Delta partitioning. Reads that filter on a partition column can prune entire partitions.- CTAS infers types from the SELECT —
int-shaped JSON numbers →long, fractional →double, strings →string, bools →boolean. For typed migrations, pass explicitcolumnsinstead via the structuredCreateTableRPC.
DML — writes
-- Single-row insert (keyed)
INSERT INTO events (id, user_id, occurred_at, event_type, payload)
VALUES (1, 'u-101', TIMESTAMP '2026-05-27 10:00:00', 'click', '{"page":"/pricing"}');
-- Bulk insert (keyed)
INSERT INTO events (id, user_id, occurred_at, event_type, payload) VALUES
(1, 'u-101', TIMESTAMP '2026-05-27 10:00:00', 'click', '{"page":"/pricing"}'),
(2, 'u-101', TIMESTAMP '2026-05-27 10:01:00', 'click', '{"page":"/signup"}'),
(3, 'u-102', TIMESTAMP '2026-05-27 10:02:00', 'purchase', '{"sku":"A-12","amount":49.99}');
-- INSERT … SELECT (keyed-from-select)
INSERT INTO click_summary
SELECT user_id, COUNT(*) FROM events WHERE event_type = 'click' GROUP BY user_id;
-- UPDATE via PK (keyed → write log)
UPDATE events SET event_type = 'click_pricing'
WHERE id = 1 AND user_id = 'u-101';
-- UPDATE via non-PK predicate (non-keyed → ⚠️ Delta-only, see warnings)
UPDATE events SET event_type = 'archived' WHERE occurred_at < TIMESTAMP '2025-01-01 00:00:00';
-- DELETE via PK (keyed)
DELETE FROM events WHERE id = 1 AND user_id = 'u-101';
-- DELETE via non-PK predicate (non-keyed → ⚠️)
DELETE FROM events WHERE occurred_at < TIMESTAMP '2025-01-01 00:00:00';
-- MERGE (upsert) — source is inline rows / a query / another table
MERGE INTO events AS t
USING (SELECT * FROM events_staging) AS s
ON t.id = s.id AND t.user_id = s.user_id
WHEN MATCHED THEN UPDATE SET event_type = s.event_type, payload = s.payload
WHEN NOT MATCHED THEN INSERT (id, user_id, occurred_at, event_type, payload)
VALUES (s.id, s.user_id, s.occurred_at, s.event_type, s.payload);Routing rule of thumb: predicates / ON clauses that match the table’s merge_keys route through the write log (the planner picks a KEYED_* strategy). Predicates that don’t route directly to Delta — fast but bypass the log, which carries the WAL-overwrite risk discussed in Execute → UPDATE.
DML — reads
-- Simple SELECT
SELECT * FROM events WHERE event_type = 'click' LIMIT 100;
-- Aggregates (federated across partitions when applicable)
SELECT user_id, COUNT(*) AS n
FROM events
GROUP BY user_id
ORDER BY n DESC;
-- Joins
SELECT e.user_id, u.email, COUNT(*) AS clicks
FROM events e JOIN users u ON e.user_id = u.id
WHERE e.event_type = 'click'
GROUP BY e.user_id, u.email;
-- Window functions
SELECT user_id, occurred_at,
LAG(occurred_at) OVER (PARTITION BY user_id ORDER BY occurred_at) AS prev_event
FROM events;
-- CTEs (WITH …)
WITH purchases AS (
SELECT user_id, payload->>'sku' AS sku, (payload->>'amount')::double AS amount
FROM events WHERE event_type = 'purchase'
)
SELECT sku, SUM(amount) AS total FROM purchases GROUP BY sku;
-- Subqueries
SELECT * FROM events WHERE user_id IN (SELECT id FROM users WHERE tier = 'pro');
-- Set ops
(SELECT user_id FROM events WHERE event_type = 'click')
INTERSECT
(SELECT user_id FROM events WHERE event_type = 'purchase');All standard DuckDB read shapes work. The planner picks UNARY_WAREHOUSE, UNARY_MERGED_STRONG, or FEDERATED_* depending on freshness, partitioning, and aggregate-vs-scan shape — see Execute → SELECT.
Read freshness
For SELECT plans, you control freshness via the freshness field on ExecuteRequest — not in the SQL itself:
| Mode | What it does | When to use |
|---|---|---|
EVENTUAL (default) | Delta-only scan; planner may distribute across partitions | Analytical reads; OLAP workloads. Fast but misses uncompacted writes. |
STRONG | Single-pod merge of write log + Delta | Read-your-writes. Pays log-listing cost; scope tightly. |
OLTP_ONLY | Write log only (no Delta scan) | Niche — point lookups during heavy un-drained backlog. Most callers want STRONG. |
JSON columns
K3’s json column is a first-class semantic type. Stored as canonical JSON text under the hood, tagged so DuckDB recognizes it. Every DuckDB JSON operator works:
-- Field access
SELECT payload->'sku' FROM events; -- returns JSON
SELECT payload->>'sku' FROM events; -- returns TEXT
-- JSON path
SELECT json_extract(payload, '$.amount') FROM events;
SELECT json_extract_string(payload, '$.sku') FROM events;
-- Iterate over array elements / object entries
SELECT json_each(payload) FROM events;
-- Cast extracted scalars
SELECT (payload->>'amount')::double AS amount FROM events;
-- Round-trip a Postgres-style cast
SELECT payload::json->'tags'->>0 FROM events;For high-volume read paths, promote hot fields to typed columns; reserve json for the long tail / unstructured metadata.
Refusals
The following are valid DuckDB SQL but K3 refuses them:
| Refused | Why | What to do instead |
|---|---|---|
BEGIN / COMMIT / ROLLBACK | No transaction coordinator in K3 today | For transactional workloads requiring multi-statement ACID, use Postgres. Individual Execute calls are atomic against Delta. |
Multi-statement bodies (stmt1; stmt2;) | One SQL per Execute call | Chain on the caller side — send one statement at a time. |
ALTER TABLE DROP COLUMN | Not implemented (parsed, then refused at planning) | Recreate the table without the column (via CREATE TABLE AS SELECT) or leave the column nullable. |
ALTER TABLE RENAME COLUMN | Not implemented (parsed, then refused) | Same as above — recreate via CTAS with the new name. |
Multiple ADD COLUMNs in a single ALTER TABLE are supported.
Native clients
Two ways to send SQL to K3 today — both speak the same dialect:
HTTP — Execute RPC
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/_execute" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"sql": "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type",
"freshness": "FRESHNESS_EVENTUAL"
}'Response is a oneof — query / write / ddl — depending on the SQL kind. See API Reference → Execute.
CLI — dodil k3 table query
# Default freshness (eventual)
dodil k3 table query --bucket kb-prod --sql "
SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type
"
# Read-your-writes
dodil k3 table query --bucket kb-prod --sql "
SELECT * FROM events WHERE user_id = 'u-101'
" --freshness strong
# JSON output for scripting
dodil k3 table query --bucket kb-prod --sql "SELECT * FROM events LIMIT 5" -o jsonThe CLI’s table query command handles both reads and writes — it picks the right RPC based on what the planner returns.
Structured shortcuts (no SQL string)
If you’d rather not assemble SQL strings, the typed RPCs (Insert / Merge / Update / DeleteRows / Query) and their CLI commands (dodil k3 table insert / merge / update / delete-rows / query) accept JSON rows + predicates / match-columns. The planner routes them through the same strategies — they’re just shortcuts for callers that prefer typed inputs to SQL templating. See API Reference → Data.
Coming-from-X cheatsheet
| Coming from | K3 equivalent |
|---|---|
Postgres INSERT ... ON CONFLICT (pk) DO UPDATE | Use MERGE INTO ... USING ... ON t.pk = s.pk WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ... |
| Postgres transactions | Not supported — wrap retry / idempotency in your app layer |
MySQL INSERT ... ON DUPLICATE KEY UPDATE | Same — use MERGE |
BigQuery CREATE TABLE AS SELECT | Same — CREATE TABLE … AS SELECT … |
BigQuery MERGE | Identical syntax |
| DuckDB itself | Identical syntax; K3 adds the HTAP routing layer |
Spark SQL / Iceberg MERGE | Similar — MERGE INTO ... USING ... WHEN MATCHED ... WHEN NOT MATCHED ... |
| Delta Lake external tools | Tables are real Delta tables — external Delta-capable engines can read them by pointing at the bucket’s Delta directory. Writes are the planner’s job; don’t write to the Delta dir externally. |
See also
- API Reference → Execute — every SQL shape → strategy + runnable examples
- Core Concepts —
QueryStrategy,WriteStrategy,Freshness,Columntypes - Quickstart — end-to-end in 5 minutes
- DuckDB SQL reference — the upstream dialect docs