Multipart upload for large files
Goal: upload files larger than ~100 MB efficiently — parallel parts, resumable on failure.
Why: a single PUT ties up one TCP connection and aborts the whole upload on any failure. Multipart breaks the file into 5 MB+ parts that upload in parallel and retry independently.
K3 supports: CreateMultipartUpload, UploadPart, CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads — see S3 Compatibility.
Heuristics: use multipart for files over ~100 MB. The S3 hard ceiling for a single
PUTis 5 GB — anything larger must use multipart.
The easy path — let the SDK do it
Every official S3 SDK auto-multiparts above a configurable threshold. You almost never write the dance by hand.
aws-cli
aws s3 cp and aws s3 sync auto-multipart by default — no flags needed:
# Multipart kicks in automatically above ~8 MB
aws s3 cp ./large-dataset.parquet s3://kb-prod/datasets/large.parquetTune via ~/.aws/config:
[profile dodil-k3]
s3 =
multipart_threshold = 64MB # start multipart above this
multipart_chunksize = 16MB # part size
max_concurrent_requests = 10 # parallel partsDon’t use raw aws s3api put-object for large files — it does a single PUT regardless of size.
Manual multipart — when you need resume / progress
Drop down to raw multipart when you need:
- Resume across process restarts — persist the
UploadIdand completed parts somewhere durable - Per-part progress to surface to a UI more granular than the SDK gives you
- Browser uploads of files larger than the network round-trip will tolerate in one PUT
The shape (same as S3 — every part returns an ETag, complete with the list of (PartNumber, ETag) pairs):
import {
S3Client,
CreateMultipartUploadCommand,
UploadPartCommand,
CompleteMultipartUploadCommand,
AbortMultipartUploadCommand,
} from "@aws-sdk/client-s3";
async function multipartUpload(s3: S3Client, bucket: string, key: string, file: Buffer) {
const PART = 16 * 1024 * 1024; // 16 MB
// 1. Initiate
const init = await s3.send(new CreateMultipartUploadCommand({
Bucket: bucket, Key: key,
}));
const uploadId = init.UploadId!;
try {
// 2. Upload each part — parallelize this with p-limit / Promise.all in production
const parts: { PartNumber: number; ETag: string }[] = [];
for (let i = 0, num = 1; i < file.length; i += PART, num++) {
const body = file.subarray(i, Math.min(i + PART, file.length));
const res = await s3.send(new UploadPartCommand({
Bucket: bucket, Key: key, UploadId: uploadId,
PartNumber: num, Body: body,
}));
parts.push({ PartNumber: num, ETag: res.ETag! });
}
// 3. Complete
await s3.send(new CompleteMultipartUploadCommand({
Bucket: bucket, Key: key, UploadId: uploadId,
MultipartUpload: { Parts: parts },
}));
} catch (err) {
// 4. On any failure, abort to free server-side state
await s3.send(new AbortMultipartUploadCommand({
Bucket: bucket, Key: key, UploadId: uploadId,
}));
throw err;
}
}Resumability: persist (uploadId, parts[]) after every successful UploadPart. On restart, pass the same uploadId to CompleteMultipartUploadCommand with the parts list you saved.
Browser-side multipart with presigned URLs
The browser pattern presigns each UploadPart URL on the backend, hands them to the browser, and the browser PUTs parts directly to K3:
[browser] --(1) POST /api/start-upload (filename, size)-----> [backend]
[backend] -- CreateMultipartUpload ------------------------> [k3]
[backend] -- presign UploadPart × N -----------------------> [k3]
[browser] <--(2) { uploadId, partUrls: [...] }-------------- [backend]
[browser] --(3) PUT partUrls[i] + chunk × N---------------> [k3] (parallel)
[browser] --(4) POST /api/complete-upload (parts: [...])--> [backend]
[backend] -- CompleteMultipartUpload ----------------------> [k3]// Backend — presign each part URL
import { UploadPartCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
async function presignParts(bucket: string, key: string, uploadId: string, partCount: number) {
return Promise.all(
Array.from({ length: partCount }, (_, i) =>
getSignedUrl(
s3,
new UploadPartCommand({
Bucket: bucket,
Key: key,
UploadId: uploadId,
PartNumber: i + 1,
}),
{ expiresIn: 3600 },
),
),
);
}Browser side, slice and PUT in parallel — collect ETags, send back to the backend for CompleteMultipartUpload. Combine this with the Browser Upload recipe for the auth flow.
Common gotchas
| Symptom | Cause | Fix |
|---|---|---|
EntityTooSmall on Complete | Any non-last part < 5 MB | Minimum part size is 5 MiB except the last one — bump partSize |
InvalidPart on Complete | ETag mismatch in your parts list | Ensure you collect the exact ETag string returned by UploadPart (including quotes if the SDK returns them quoted) |
| Stalled / abandoned upload eating quota | Never called AbortMultipartUpload | List with ListMultipartUploads (GET /:bucket?uploads) and abort stragglers periodically |
| Pipeline didn’t fire | Ingest only fires on CompleteMultipartUpload — never on individual parts | Make sure you call Complete; not Abort |
See also
- Browser Upload — the auth flow for serving presigned URLs to a browser
- S3 Compatibility — what S3 multipart actions K3 supports
- API Reference — Objects — byte-plane endpoints