Pipelines
K3 doesn’t just store objects — it processes them. Every bucket can attach sources, pipelines, and rules that turn any new object into structured rows, vector embeddings, or any other artifact, automatically and asynchronously.
This is the K3-specific differentiator over plain S3: you PUT a PDF; ten seconds later it’s chunked, embedded, and searchable in a vector collection. You write a CSV to a bucket; rows land in a warehouse table. No glue code.
The production path: just upload
The fast version: every bucket comes with an auto-created internal S3 source. You don’t register it, you don’t authenticate it — it’s there from CreateBucket. Every direct upload to the bucket fires through your rules into your pipelines. That’s the production-ready ingest channel today.
External sources (Google Drive, SharePoint, Confluence, GitHub) and their OAuth + credential flows are available in the API but are Preview — solidifying for production use. The rest of these docs cover the full surface; the Preview pieces are clearly marked.
The six entities
| What it is | Status | Proto package | |
|---|---|---|---|
| Source | Where objects come from. InternalS3 is auto-created per bucket (Production). GoogleDrive, SharePoint, Confluence, GitHub are Preview. | mixed | dodil.k3.source.v1 |
| Credential | How K3 auths to a source. Used only for external sources today. | Preview | dodil.k3.source.v1 |
| Pipeline | The recipe — a Scriptum template + options + optional destination | Production | dodil.k3.pipeline.v1 |
| Template | Catalog entry from Scriptum — what new pipelines can spawn from. K3 ships a rich production catalog (core / embedding / vision / ecommerce) — see Templates | Production | dodil.k3.pipeline.v1 |
| Rule | The trigger — glob/MIME/size filters that bind a source to a pipeline | Production | dodil.k3.ingest.v1 |
| Ingest job | One execution of one pipeline against one object | Production | dodil.k3.ingest.v1 |
How it works
your object ──upload──► bucket
│
│ rules match (path / MIME / size)
▼
pipeline runs
(Scriptum template
+ your options)
│
▼
destination
(vector collection /
warehouse table /
nothing — free pipeline)- An object lands — a direct S3
PUTto the bucket, or a sync from an external source (Google Drive, GitHub, …) discovers a new file. - K3 evaluates the bucket’s rules against the object — does its path match the globs? does its MIME or size pass the filters?
- For each matching rule, K3 runs the bound pipeline against the object. A pipeline is a Scriptum template (the recipe) plus your per-pipeline options.
- Results land in the pipeline’s destination — a vector collection, a warehouse table, or nothing (a free pipeline just runs the script and surfaces its output via the job record).
- Every run is tracked as an ingest job you can inspect, retry, or replay. Transient failures are retried automatically.
Pipeline kinds
A pipeline’s kind is derived from its destination:
vector— destination is a vector collection (see Vector). Outputs are embeddings + chunks indexed for similarity search.warehouse— destination is a table (see Tables). Outputs are structured rows.free— no persistent destination. The Scriptum script runs and its outputs are observed via the ingest job; useful for one-off computation, side-effects, or pipelines whose output you’ll fan out yourself.
The kind affects which counters on IngestJob get populated (chunks_created / embeddings_created for vector, rows_written for warehouse) but everything else — rules, triggers, replay — is identical across kinds.
What you can do
- Define multiple pipelines per bucket, each pointed at a different destination
- Glob + MIME + size routing via rules with priority ordering
- Manual triggers for testing or backfill — single object, or all pending objects on a source
- Inspect every run — chunk counts, embedding counts, row counts, retry attempts, Scriptum thread IDs
- Replay against a new pipeline version by re-binding the rule and re-triggering
In this section
- Quickstart — index your first PDF as embeddings in 5 minutes
- Core Concepts — proto-grounded type signatures for every entity + the event flow
- API Reference — Source + Pipeline + Ingest services (gRPC + HTTP), grouped per resource
- CLI Guide —
dodil k3 source·pipeline·template·ingest·credential
See also
- Object Storage — where every K3 pipeline starts (objects)
- Vector — the destination for
vectorpipelines (Milvus collections) - Tables — the destination for
warehousepipelines (Delta tables) - Conventions — auth, headers, error envelope
- CLI Basics — install + common flags