K3 Service Overview

Last validated: 2026-05-11

K3 is an augmented S3 platform. It keeps standard object storage semantics, then adds ingestion pipelines and query services so data can be searched and processed as:

Vector data (semantic and hybrid search)
Structured table data (SQL and HTAP workflows)
Source-driven ingestion jobs (discovery, indexing, re-ingestion)

K3 currently runs as two binaries:

k3-api: HTTP + gRPC control/data plane
k3-worker: asynchronous execution plane (NATS consumers, schedulers, maintenance)

Core Concepts

Bucket

A bucket is both a storage namespace and a knowledge namespace. Most K3 APIs are bucket scoped.

Source

A source defines where content is synced from (internal S3 or external provider connectors).

Pipeline

A pipeline defines what processing should run (Scriptum template/script + options + optional destination entity).

Rule

A rule binds source matching conditions to a pipeline. Rules trigger discovery and ingestion behavior.

Vector Engine and Collections

The vector engine is the per-bucket vector backend configuration. Collections are the searchable vector datasets.

Table Engine and Tables

The table engine is the per-bucket HTAP/tables backend configuration. Tables are structured datasets supporting query and maintenance operations.

Ingest Job

An ingest job is one pipeline execution instance over one object (manual trigger or discovery-triggered).

Service Decomposition

K3 API is split into six service domains (not a single monolithic K3 service):

StorageService
SourceService
PipelineService
IngestService
VectorService
TableService

This domain split is reflected in:

Proto package namespaces (dodil.k3.<domain>.v1)
HTTP route grouping under bin/api/src/http/api/mod.rs
gRPC registration in bin/api/src/main.rs

Runtime Data Flow

Write and ingest path

Client uploads data through S3-compatible proxy or other source path.
K3 publishes discovery/ingest events to NATS JetStream.
Worker consumes events and invokes Scriptum-driven processing.
Outputs are persisted in vector collections, table backends, and/or job records.

Query path

Client calls vector or table APIs.
API routes request to corresponding service handlers.
Service executes backend operations (Milvus, HTAP/Ignite, metadata DB, S3).

What Is Live vs Planned

Live, code-validated surfaces:

Storage and bucket management
Source and credential management
Pipeline and ingest rule/job workflows
Vector engine/collections/search and external vector writes
Table engine/tables/query/maintenance/execute flows

Design docs in the K3 repo also describe future or evolving capabilities. Treat those as roadmap unless confirmed in code paths and active routes.

For exact feature coverage and mismatches, see:

docs/07-feature-status.md