Skip to Content
We are live but in Staging 🎉
Object StorageRecipesMultipart Large Files

Multipart upload for large files

Goal: upload files larger than ~100 MB efficiently — parallel parts, resumable on failure.

Why: a single PUT ties up one TCP connection and aborts the whole upload on any failure. Multipart breaks the file into 5 MB+ parts that upload in parallel and retry independently.

K3 supports: CreateMultipartUpload, UploadPart, CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads — see S3 Compatibility.

Heuristics: use multipart for files over ~100 MB. The S3 hard ceiling for a single PUT is 5 GB — anything larger must use multipart.

The easy path — let the SDK do it

Every official S3 SDK auto-multiparts above a configurable threshold. You almost never write the dance by hand.

aws s3 cp and aws s3 sync auto-multipart by default — no flags needed:

# Multipart kicks in automatically above ~8 MB aws s3 cp ./large-dataset.parquet s3://kb-prod/datasets/large.parquet

Tune via ~/.aws/config:

[profile dodil-k3] s3 = multipart_threshold = 64MB # start multipart above this multipart_chunksize = 16MB # part size max_concurrent_requests = 10 # parallel parts

Don’t use raw aws s3api put-object for large files — it does a single PUT regardless of size.

Manual multipart — when you need resume / progress

Drop down to raw multipart when you need:

  • Resume across process restarts — persist the UploadId and completed parts somewhere durable
  • Per-part progress to surface to a UI more granular than the SDK gives you
  • Browser uploads of files larger than the network round-trip will tolerate in one PUT

The shape (same as S3 — every part returns an ETag, complete with the list of (PartNumber, ETag) pairs):

import { S3Client, CreateMultipartUploadCommand, UploadPartCommand, CompleteMultipartUploadCommand, AbortMultipartUploadCommand, } from "@aws-sdk/client-s3"; async function multipartUpload(s3: S3Client, bucket: string, key: string, file: Buffer) { const PART = 16 * 1024 * 1024; // 16 MB // 1. Initiate const init = await s3.send(new CreateMultipartUploadCommand({ Bucket: bucket, Key: key, })); const uploadId = init.UploadId!; try { // 2. Upload each part — parallelize this with p-limit / Promise.all in production const parts: { PartNumber: number; ETag: string }[] = []; for (let i = 0, num = 1; i < file.length; i += PART, num++) { const body = file.subarray(i, Math.min(i + PART, file.length)); const res = await s3.send(new UploadPartCommand({ Bucket: bucket, Key: key, UploadId: uploadId, PartNumber: num, Body: body, })); parts.push({ PartNumber: num, ETag: res.ETag! }); } // 3. Complete await s3.send(new CompleteMultipartUploadCommand({ Bucket: bucket, Key: key, UploadId: uploadId, MultipartUpload: { Parts: parts }, })); } catch (err) { // 4. On any failure, abort to free server-side state await s3.send(new AbortMultipartUploadCommand({ Bucket: bucket, Key: key, UploadId: uploadId, })); throw err; } }

Resumability: persist (uploadId, parts[]) after every successful UploadPart. On restart, pass the same uploadId to CompleteMultipartUploadCommand with the parts list you saved.

Browser-side multipart with presigned URLs

The browser pattern presigns each UploadPart URL on the backend, hands them to the browser, and the browser PUTs parts directly to K3:

[browser] --(1) POST /api/start-upload (filename, size)-----> [backend] [backend] -- CreateMultipartUpload ------------------------> [k3] [backend] -- presign UploadPart × N -----------------------> [k3] [browser] <--(2) { uploadId, partUrls: [...] }-------------- [backend] [browser] --(3) PUT partUrls[i] + chunk × N---------------> [k3] (parallel) [browser] --(4) POST /api/complete-upload (parts: [...])--> [backend] [backend] -- CompleteMultipartUpload ----------------------> [k3]
// Backend — presign each part URL import { UploadPartCommand } from "@aws-sdk/client-s3"; import { getSignedUrl } from "@aws-sdk/s3-request-presigner"; async function presignParts(bucket: string, key: string, uploadId: string, partCount: number) { return Promise.all( Array.from({ length: partCount }, (_, i) => getSignedUrl( s3, new UploadPartCommand({ Bucket: bucket, Key: key, UploadId: uploadId, PartNumber: i + 1, }), { expiresIn: 3600 }, ), ), ); }

Browser side, slice and PUT in parallel — collect ETags, send back to the backend for CompleteMultipartUpload. Combine this with the Browser Upload recipe for the auth flow.

Common gotchas

SymptomCauseFix
EntityTooSmall on CompleteAny non-last part < 5 MBMinimum part size is 5 MiB except the last one — bump partSize
InvalidPart on CompleteETag mismatch in your parts listEnsure you collect the exact ETag string returned by UploadPart (including quotes if the SDK returns them quoted)
Stalled / abandoned upload eating quotaNever called AbortMultipartUploadList with ListMultipartUploads (GET /:bucket?uploads) and abort stragglers periodically
Pipeline didn’t fireIngest only fires on CompleteMultipartUpload — never on individual partsMake sure you call Complete; not Abort

See also