Pipelines

K3 doesn’t just store objects — it processes them. Every bucket can attach sources, pipelines, and rules that turn any new object into structured rows, vector embeddings, or any other artifact, automatically and asynchronously.

This is the K3-specific differentiator over plain S3: you PUT a PDF; ten seconds later it’s chunked, embedded, and searchable in a vector collection. You write a CSV to a bucket; rows land in a warehouse table. No glue code.

The production path: just upload

The fast version: every bucket comes with an auto-created internal S3 source. You don’t register it, you don’t authenticate it — it’s there from CreateBucket. Every direct upload to the bucket fires through your rules into your pipelines. That’s the production-ready ingest channel today.

External sources (Google Drive, SharePoint, Confluence, GitHub) and their OAuth + credential flows are available in the API but are Preview — solidifying for production use. The rest of these docs cover the full surface; the Preview pieces are clearly marked.

The six entities

	What it is	Status	Proto package
Source	Where objects come from. `InternalS3` is auto-created per bucket (Production). `GoogleDrive`, `SharePoint`, `Confluence`, `GitHub` are Preview.	mixed	`dodil.k3.source.v1`
Credential	How K3 auths to a source. Used only for external sources today.	Preview	`dodil.k3.source.v1`
Pipeline	The recipe — a Scriptum template + options + optional destination	Production	`dodil.k3.pipeline.v1`
Template	Catalog entry from Scriptum — what new pipelines can spawn from. K3 ships a rich production catalog (core / embedding / vision / ecommerce) — see Templates	Production	`dodil.k3.pipeline.v1`
Rule	The trigger — glob/MIME/size filters that bind a source to a pipeline	Production	`dodil.k3.ingest.v1`
Ingest job	One execution of one pipeline against one object	Production	`dodil.k3.ingest.v1`

How it works


   your object  ──upload──►  bucket
                                │
                                │  rules match (path / MIME / size)
                                ▼
                         pipeline runs
                       (Scriptum template
                        + your options)
                                │
                                ▼
                          destination
                     (vector collection /
                      warehouse table /
                      nothing — free pipeline)

An object lands — a direct S3 PUT to the bucket, or a sync from an external source (Google Drive, GitHub, …) discovers a new file.
K3 evaluates the bucket’s rules against the object — does its path match the globs? does its MIME or size pass the filters?
For each matching rule, K3 runs the bound pipeline against the object. A pipeline is a Scriptum template (the recipe) plus your per-pipeline options.
Results land in the pipeline’s destination — a vector collection, a warehouse table, or nothing (a free pipeline just runs the script and surfaces its output via the job record).
Every run is tracked as an ingest job you can inspect, retry, or replay. Transient failures are retried automatically.

Pipeline kinds

A pipeline’s kind is derived from its destination:

vector — destination is a vector collection (see Vector). Outputs are embeddings + chunks indexed for similarity search.
warehouse — destination is a table (see Tables). Outputs are structured rows.
free — no persistent destination. The Scriptum script runs and its outputs are observed via the ingest job; useful for one-off computation, side-effects, or pipelines whose output you’ll fan out yourself.

The kind affects which counters on IngestJob get populated (chunks_created / embeddings_created for vector, rows_written for warehouse) but everything else — rules, triggers, replay — is identical across kinds.

What you can do

Define multiple pipelines per bucket, each pointed at a different destination
Glob + MIME + size routing via rules with priority ordering
Manual triggers for testing or backfill — single object, or all pending objects on a source
Inspect every run — chunk counts, embedding counts, row counts, retry attempts, Scriptum thread IDs
Replay against a new pipeline version by re-binding the rule and re-triggering

In this section

Quickstart — index your first PDF as embeddings in 5 minutes
Core Concepts — proto-grounded type signatures for every entity + the event flow
API Reference — Source + Pipeline + Ingest services (gRPC + HTTP), grouped per resource
CLI Guide — dodil k3 source · pipeline · template · ingest · credential