Skip to Content
We are live but in Staging 🎉
TablesRecipesTime Travel & Restore

Time Travel & Restore

Goal: recover from a bad write by rolling the table back to a known-good Delta version. Understand how Vacuum interacts with Restore so you don’t accidentally lock yourself out of recovery.

Why this matters: Delta Lake’s commit log makes every write reversible — until Vacuum reclaims the old file versions past retention. Knowing the relationship between the three RPCs (History, Restore, Vacuum) is what turns a panic into a one-line fix.

Shape:

write → write → write (versions advance monotonically) │ │ │ ▼ ▼ ▼ ┌──────────────────────┐ │ commit log: 0..N │ ◄── History returns this └──────────────────────┘ ├─► Restore version=K → creates version N+1 (= state at K) └─► Vacuum (≥168h old) → permanently removes files for old versions (no restore to those versions afterwards)

Prerequisites

  • A bucket + table with at least a few writes (we’ll reuse the events table from the Manual Table recipe).
  • dodil CLI + dodil login.

Setting the scene — what we’ll roll back

Let’s simulate a bad bulk-write that we want to undo. Starting from a freshly-built events table:

# Look at the current state dodil k3 table describe events --bucket kb-prod -o json \ | jq '{version, lastDrainTargetVersion, drainLagSecs}'

Say it returns version: 5.

Now imagine you do a destructive write — e.g. an UPDATE with a too-broad predicate:

# OOPS — meant id = 99 but typo dropped the filter dodil k3 table update events --bucket kb-prod \ --predicate "id > 0" \ --updates-json '{"event_type":"archived"}' # (also drain it so it lands in Delta — otherwise we'd just need to wait) dodil k3 table compact events --bucket kb-prod

Now every row says event_type = 'archived'. We want the previous state back.

1. Inspect the commit log

History returns the Delta commit log — every write, optimize, vacuum, restore, schema change. Newest first; paginate backwards via before_version.

# Most recent activity curl -sS "https://k3.dev.dodil.io/kb-prod/tables/events/history?limit=10" \ -H "Authorization: Bearer $DODIL_TOKEN" \ | jq '{ currentVersion, hasMore, entries: [.entries[] | {version, operation, params: .parameters}] }'

Sample output:

{ "currentVersion": "7", "hasMore": true, "entries": [ { "version": "7", "operation": "WRITE", "params": { "predicate": "id > 0" } }, { "version": "6", "operation": "OPTIMIZE", "params": { "targetSize": "134217728" } }, { "version": "5", "operation": "MERGE", "params": { "predicate": "id = source.id AND user_id = source.user_id" } }, { "version": "4", "operation": "WRITE", "params": { "predicate": null } } ] }

Version 7 is the bad WRITE. Version 5 — the last MERGE that produced the data state we want — is our restore target.

Want richer detail? Include operationMetrics (Delta-native row counters per op) and userMetadata:

curl -sS "https://k3.dev.dodil.io/kb-prod/tables/events/history?limit=10" \ -H "Authorization: Bearer $DODIL_TOKEN" \ | jq '.entries[] | { version, operation, params: .parameters, metrics: .operationMetrics, engine: .engineInfo }'

operationMetrics keys vary per op:

OperationUseful metrics
WRITE / MERGE / UPDATEnumTargetRowsInserted, numTargetRowsUpdated, numTargetRowsDeleted
OPTIMIZEnumFilesAdded, numFilesRemoved, numBatches
VACUUM START / ENDnumCopiedRows, file paths
RESTOREnumCopiedRows, restored version
CREATE TABLEnumFiles (initially 0)

Order by version, not timestamp — clock skew across writers can make timestamp slightly non-monotonic. The version is the authoritative sequence.

2. Restore by version

The Restore RPC isn’t in the CLI yet — drop to the API:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/events/restore" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "tableName": "events", "version": "5" }'

Sample response:

{ "versionBefore": "7", "versionAfter": "8" }

Note: a restore is itself a commit. It produces a new Delta version (versionAfter: 8) whose state equals the state at the target version (5). The intervening versions (6, 7) still exist in history — you can restore back forward to them too (until you Vacuum them).

Restore by timestamp

If you don’t have a clean version number but know the wall-clock time:

curl -sS -X POST "https://k3.dev.dodil.io/kb-prod/tables/events/restore" \ -H "Authorization: Bearer $DODIL_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "bucket": "kb-prod", "tableName": "events", "restoreTimestamp": "2026-05-27T08:00:00Z" }'

K3 picks the latest version committed at or before the timestamp. Mutually exclusive with version — pass exactly one.

3. Verify the restore

# 1. Check that the version advanced (new commit) dodil k3 table describe events --bucket kb-prod -o json \ | jq '{version}' # expect 8 # 2. Confirm data is back to its pre-bad-write state dodil k3 table query \ "SELECT event_type, COUNT(*) AS n FROM events GROUP BY event_type ORDER BY n DESC" \ --bucket kb-prod # expect the original distribution (click, purchase, signup) — not all "archived" # 3. History should now show RESTORE at the head curl -sS "https://k3.dev.dodil.io/kb-prod/tables/events/history?limit=5" \ -H "Authorization: Bearer $DODIL_TOKEN" \ | jq '.entries[] | {version, operation, params: .parameters}'

You should see a RESTORE entry at version: 8 with parameters.version: 5.

4. Understand the vacuum retention window

Vacuum permanently deletes Delta file versions past the retention window (default 168 h = 7 days). Once a version’s underlying files are gone, you can no longer restore to it.

Visualize the safety windows:

time (newest → oldest) v_now (live) │ ◄── always restorable v - 168h │ ═══════════════ DELTA SAFETY BAND ═══════════════ │ Delta refuses to vacuum files this fresh by default │ (--disable-retention-check overrides) v - retention_hours (your configured cutoff) │ candidates for vacuum v - ∞

Two retention numbers matter:

DefaultMeaning
Delta safety band168 h (7 days)Hard floor; Delta refuses to vacuum files younger than this because in-flight readers / time-travel queries break
--retention-hours flag168 hYour chosen cutoff — must be ≥ 168 h unless --disable-retention-check is set

5. Vacuum safely — dry run first

Always dry-run before vacuuming production tables:

dodil k3 table vacuum events --bucket kb-prod --dry-run -o json \ | jq '{ filesDeleted, sampleFiles: .filesDeletedPaths[0:5] }'

Sample output:

{ "filesDeleted": "42", "sampleFiles": [ "s3://kb-prod/.../events/event_type=click/part-00000-old-uuid.parquet", "s3://kb-prod/.../events/event_type=purchase/part-00000-old-uuid.parquet" ] }

Confirm the count + paths match expectations. Then for the real run:

dodil k3 table vacuum events --bucket kb-prod

Or with a custom retention:

# Vacuum anything older than 30 days dodil k3 table vacuum events --bucket kb-prod --retention-hours 720

6. The vacuum-restore interaction (the trap)

The pattern that bites people:

# Day 1 — bad write at v7 dodil k3 table update events --bucket kb-prod --predicate "id > 0" --updates-json '...' dodil k3 table compact events --bucket kb-prod # Day 8 — someone runs vacuum with the 168h default (= 7 day retention) dodil k3 table vacuum events --bucket kb-prod # Day 9 — you realize the bad write at v7 and try to restore to v5 curl ... /restore -d '{"version":"5"}' # ❌ ERROR: files for version 5 were vacuumed

The fix is a policy: don’t run Vacuum aggressively unless you have a separate audit / backup story for the data range you’re vacuuming.

Safer vacuum patterns

  1. Stretch retention for tables where time-travel matters:

    # 30-day retention — gives you a month-long restore window dodil k3 table vacuum events --bucket kb-prod --retention-hours 720
  2. Dry-run + audit before real run — log the file count + sample paths:

    dodil k3 table vacuum events --bucket kb-prod --dry-run -o json \ | tee /var/log/k3-vacuum-events-$(date +%Y%m%d).json
  3. Pin a restore point — if you know you’ll want to restore to a specific version, materialize it as a separate table first via Materialize (see CTAS & Materialize). The materialized copy is now an independent Delta table with its own history — vacuum on the source doesn’t affect it.

When you DO need to vacuum younger files

For forced cleanups on quiesced tables (no in-flight readers, no time-travel needed):

dodil k3 table vacuum events --bucket kb-prod \ --retention-hours 24 \ --disable-retention-check

--disable-retention-check bypasses Delta’s safety band. Only run on tables you’ve explicitly stopped writing to — in-flight readers / time-travel queries against pre-vacuum versions will fail once files disappear.

7. Per-operation restore patterns

ScenarioApproach
Bad UPDATE with wrong predicateFind the version BEFORE the UPDATE via History; Restore version=<previous>
Bad DELETESame — restore to the version before the DELETE commit
Bad MERGERestore to the version before the MERGE; for surgical MERGE-undo, materialize the pre-MERGE snapshot as a temp table, then re-MERGE selectively
Schema mistake (ALTER added the wrong column)Restore to the pre-ALTER version; existing readers won’t see the dropped column anymore
Accidental DROP TABLECannot restore via Restore — the table row is gone. Recreate via CREATE TABLE and bulk-load from any backup / mirror you have
Catastrophic compactor bugRestore + then run Vacuum to physically remove the bad files (after the safety band)

Common gotchas

SymptomCauseFix
Restore version=X returns an error mentioning “missing data files”Vacuum already reclaimed files for that versionPick a more recent version via History; otherwise the data is unrecoverable
History shows commits older than retention but you can’t restore to themFiles were vacuumed; only the metadata entry survives in the commit logSame as above
Restore succeeds but query returns unexpected dataThe restore target was older than you thoughtRe-check the History output — version is the source of truth, not timestamp
Vacuum --dry-run shows 0 filesEither nothing to vacuum (table fresh + retention high) or retention cutoff is within the safety bandLower --retention-hours (≥ 168), or use --disable-retention-check for forced cleanup
Vacuum succeeds but space wasn’t reclaimed on S3Vacuum issues DELETE operations against the underlying object store; some S3 implementations have eventual-consistency on space-accounting (incl. lifecycle policies)Wait — space accounting catches up. Check the object store directly if urgent.

See also