Time Travel & Restore
Goal: recover from a bad write by rolling the table back to a known-good Delta version. Understand how Vacuum interacts with Restore so you don’t accidentally lock yourself out of recovery.
Why this matters: Delta Lake’s commit log makes every write reversible — until Vacuum reclaims the old file versions past retention. Knowing the relationship between the three RPCs (History, Restore, Vacuum) is what turns a panic into a one-line fix.
Shape:
write → write → write (versions advance monotonically)
│ │ │
▼ ▼ ▼
┌──────────────────────┐
│ commit log: 0..N │ ◄── History returns this
└──────────────────────┘
│
├─► Restore version=K → creates version N+1 (= state at K)
│
└─► Vacuum (≥168h old) → permanently removes files for old versions
(no restore to those versions afterwards)Prerequisites
- A bucket + table with at least a few writes (we’ll reuse the
eventstable from the Manual Table recipe). dodilCLI +dodil login.
Setting the scene — what we’ll roll back
Let’s simulate a bad bulk-write that we want to undo. Starting from a freshly-built events table:
# Look at the current state
dodil k3 table describe events --bucket kb-prod -o json \
| jq '{version, lastDrainTargetVersion, drainLagSecs}'Say it returns version: 5.
Now imagine you do a destructive write — e.g. an UPDATE with a too-broad predicate:
# OOPS — meant id = 99 but typo dropped the filter
dodil k3 table update events --bucket kb-prod \
--predicate "id > 0" \
--updates-json '{"event_type":"archived"}'
# (also drain it so it lands in Delta — otherwise we'd just need to wait)
dodil k3 table compact events --bucket kb-prodNow every row says event_type = 'archived'. We want the previous state back.
1. Inspect the commit log
History returns the Delta commit log — every write, optimize, vacuum, restore, schema change. Newest first; paginate backwards via before_version.
# Most recent activity
curl -sS "https://k3.dev.dodil.io/kb-prod/tables/events/history?limit=10" \
-H "Authorization: Bearer $DODIL_TOKEN" \
| jq '{
currentVersion,
hasMore,
entries: [.entries[] | {version, operation, params: .parameters}]
}'Sample output:
{
"currentVersion": "7",
"hasMore": true,
"entries": [
{
"version": "7",
"operation": "WRITE",
"params": { "predicate": "id > 0" }
},
{
"version": "6",
"operation": "OPTIMIZE",
"params": { "targetSize": "134217728" }
},
{
"version": "5",
"operation": "MERGE",
"params": { "predicate": "id = source.id AND user_id = source.user_id" }
},
{
"version": "4",
"operation": "WRITE",
"params": { "predicate": null }
}
]
}Version 7 is the bad WRITE. Version 5 — the last MERGE that produced the data state we want — is our restore target.
Want richer detail? Include operationMetrics (Delta-native row counters per op) and userMetadata:
curl -sS "https://k3.dev.dodil.io/kb-prod/tables/events/history?limit=10" \
-H "Authorization: Bearer $DODIL_TOKEN" \
| jq '.entries[] | {
version, operation,
params: .parameters,
metrics: .operationMetrics,
engine: .engineInfo
}'operationMetrics keys vary per op:
| Operation | Useful metrics |
|---|---|
WRITE / MERGE / UPDATE | numTargetRowsInserted, numTargetRowsUpdated, numTargetRowsDeleted |
OPTIMIZE | numFilesAdded, numFilesRemoved, numBatches |
VACUUM START / END | numCopiedRows, file paths |
RESTORE | numCopiedRows, restored version |
CREATE TABLE | numFiles (initially 0) |
Order by version, not timestamp — clock skew across writers can make timestamp slightly non-monotonic. The version is the authoritative sequence.
2. Restore by version
The Restore RPC isn’t in the CLI yet — drop to the API:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/events/restore" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"tableName": "events",
"version": "5"
}'Sample response:
{
"versionBefore": "7",
"versionAfter": "8"
}Note: a restore is itself a commit. It produces a new Delta version (versionAfter: 8) whose state equals the state at the target version (5). The intervening versions (6, 7) still exist in history — you can restore back forward to them too (until you Vacuum them).
Restore by timestamp
If you don’t have a clean version number but know the wall-clock time:
curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/events/restore" \
-H "Authorization: Bearer $DODIL_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket": "kb-prod",
"tableName": "events",
"restoreTimestamp": "2026-05-27T08:00:00Z"
}'K3 picks the latest version committed at or before the timestamp. Mutually exclusive with version — pass exactly one.
3. Verify the restore
# 1. Check that the version advanced (new commit)
dodil k3 table describe events --bucket kb-prod -o json \
| jq '{version}'
# expect 8
# 2. Confirm data is back to its pre-bad-write state
dodil k3 table query \
"SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC" \
--bucket kb-prod
# expect the original distribution (click, purchase, signup) — not all "archived"
# 3. History should now show RESTORE at the head
curl -sS "https://k3.dev.dodil.io/kb-prod/tables/events/history?limit=5" \
-H "Authorization: Bearer $DODIL_TOKEN" \
| jq '.entries[] | {version, operation, params: .parameters}'You should see a RESTORE entry at version: 8 with parameters.version: 5.
4. Understand the vacuum retention window
Vacuum permanently deletes Delta file versions past the retention window (default 168 h = 7 days). Once a version’s underlying files are gone, you can no longer restore to it.
Visualize the safety windows:
time (newest → oldest)
v_now (live)
│
│ ◄── always restorable
│
v - 168h
│ ═══════════════ DELTA SAFETY BAND ═══════════════
│ Delta refuses to vacuum files this fresh by default
│ (--disable-retention-check overrides)
│
v - retention_hours (your configured cutoff)
│
│ candidates for vacuum
│
v - ∞Two retention numbers matter:
| Default | Meaning | |
|---|---|---|
| Delta safety band | 168 h (7 days) | Hard floor; Delta refuses to vacuum files younger than this because in-flight readers / time-travel queries break |
--retention-hours flag | 168 h | Your chosen cutoff — must be ≥ 168 h unless --disable-retention-check is set |
5. Vacuum safely — dry run first
Always dry-run before vacuuming production tables:
dodil k3 table vacuum events --bucket kb-prod --dry-run -o json \
| jq '{
filesDeleted,
sampleFiles: .filesDeletedPaths[0:5]
}'Sample output:
{
"filesDeleted": "42",
"sampleFiles": [
"s3://kb-prod/.../events/event_type=click/part-00000-old-uuid.parquet",
"s3://kb-prod/.../events/event_type=purchase/part-00000-old-uuid.parquet"
]
}Confirm the count + paths match expectations. Then for the real run:
dodil k3 table vacuum events --bucket kb-prodOr with a custom retention:
# Vacuum anything older than 30 days
dodil k3 table vacuum events --bucket kb-prod --retention-hours 7206. The vacuum-restore interaction (the trap)
The pattern that bites people:
# Day 1 — bad write at v7
dodil k3 table update events --bucket kb-prod --predicate "id > 0" --updates-json '...'
dodil k3 table compact events --bucket kb-prod
# Day 8 — someone runs vacuum with the 168h default (= 7 day retention)
dodil k3 table vacuum events --bucket kb-prod
# Day 9 — you realize the bad write at v7 and try to restore to v5
curl ... /restore -d '{"version":"5"}'
# ❌ ERROR: files for version 5 were vacuumedThe fix is a policy: don’t run Vacuum aggressively unless you have a separate audit / backup story for the data range you’re vacuuming.
Safer vacuum patterns
-
Stretch retention for tables where time-travel matters:
# 30-day retention — gives you a month-long restore window dodil k3 table vacuum events --bucket kb-prod --retention-hours 720 -
Dry-run + audit before real run — log the file count + sample paths:
dodil k3 table vacuum events --bucket kb-prod --dry-run -o json \ | tee /var/log/k3-vacuum-events-$(date +%Y%m%d).json -
Pin a restore point — if you know you’ll want to restore to a specific version, materialize it as a separate table first via
Materialize(see CTAS & Materialize). The materialized copy is now an independent Delta table with its own history — vacuum on the source doesn’t affect it.
When you DO need to vacuum younger files
For forced cleanups on quiesced tables (no in-flight readers, no time-travel needed):
dodil k3 table vacuum events --bucket kb-prod \
--retention-hours 24 \
--disable-retention-check--disable-retention-check bypasses Delta’s safety band. Only run on tables you’ve explicitly stopped writing to — in-flight readers / time-travel queries against pre-vacuum versions will fail once files disappear.
7. Per-operation restore patterns
| Scenario | Approach |
|---|---|
| Bad UPDATE with wrong predicate | Find the version BEFORE the UPDATE via History; Restore version=<previous> |
| Bad DELETE | Same — restore to the version before the DELETE commit |
| Bad MERGE | Restore to the version before the MERGE; for surgical MERGE-undo, materialize the pre-MERGE snapshot as a temp table, then re-MERGE selectively |
| Schema mistake (ALTER added the wrong column) | Restore to the pre-ALTER version; existing readers won’t see the dropped column anymore |
| Accidental DROP TABLE | Cannot restore via Restore — the table row is gone. Recreate via CREATE TABLE and bulk-load from any backup / mirror you have |
| Catastrophic compactor bug | Restore + then run Vacuum to physically remove the bad files (after the safety band) |
Common gotchas
| Symptom | Cause | Fix |
|---|---|---|
Restore version=X returns an error mentioning “missing data files” | Vacuum already reclaimed files for that version | Pick a more recent version via History; otherwise the data is unrecoverable |
History shows commits older than retention but you can’t restore to them | Files were vacuumed; only the metadata entry survives in the commit log | Same as above |
| Restore succeeds but query returns unexpected data | The restore target was older than you thought | Re-check the History output — version is the source of truth, not timestamp |
Vacuum --dry-run shows 0 files | Either nothing to vacuum (table fresh + retention high) or retention cutoff is within the safety band | Lower --retention-hours (≥ 168), or use --disable-retention-check for forced cleanup |
Vacuum succeeds but space wasn’t reclaimed on S3 | Vacuum issues DELETE operations against the underlying object store; some S3 implementations have eventual-consistency on space-accounting (incl. lifecycle policies) | Wait — space accounting catches up. Check the object store directly if urgent. |
See also
- Maintenance — API Reference — full
Optimize/Vacuum/Compact/Restore/Historysurface - CTAS & Materialize — pin restore points by materializing them
- Manual Table — the schema we restored in this recipe
- Core Concepts → HistoryEntry — the type behind
Historyresponses