Document intelligence

The outcome

You have a pile of unstructured documents — PDFs, contracts, invoices, forms, scanned tickets — and the useful information is locked inside them. Document intelligence turns that pile into structured, queryable data: extract the entities and fields that matter, route and triage each document, then search across the results and run SQL over them.

Why it matters:

No manual data entry. Fields, parties, dates, totals, and identifiers come out automatically as documents arrive — nobody re-keys a contract into a spreadsheet.
Searchable archives. Find the document you need by what’s in it, not by remembering its filename.
Analytics on documents. Once the contents are rows, your documents become a dataset — aggregate, filter, join, and report on them like any other table.

What you build on Dodil

The flow is: documents land, a pipeline extracts structure on upload, and the results become rows you can query. You don’t write an ETL job or stitch services together — the extraction runs automatically as files arrive.

Intake — drop documents into K3 Storage . Every upload is an event the rest of the flow reacts to.
Extraction — K3 pipelines run extraction templates (e.g. entity extraction or document triage) against each new file and write structured rows into K3 Tables — no glue code in between.
Query — run SQL directly over the extracted rows in K3 Tables: filter, aggregate, join, and build dashboards on top of your documents.
Semantic retrieval (optional) — also index the documents for meaning-based search via K3 Vector , so you can find passages by what they say rather than by exact keywords.
Inference — the extraction and any reasoning over documents is powered by Ignite Models , the managed inference catalog behind the platform.

Why it’s faster and cheaper here

Without Dodil, document intelligence is a stack you assemble and operate piece by piece:

an OCR / extraction service to pull text out of PDFs and scans,
an LLM to structure that raw text into fields and entities,
an ETL pipeline to move and shape the output,
a warehouse or database to hold the structured results,
a vector store for semantic retrieval,
the glue code that wires all of it together,
and an auth story spanning every one of those services.

Each is a separate vendor, account, bill, and on-call rotation — and the integration code between them is yours to build and maintain forever.

On Dodil that stack collapses into one platform. Documents land in Storage; pipelines extract-on-upload straight into Tables — extraction, structuring, and loading happen automatically as files arrive, with no ETL to author. The structured results are immediately queryable with SQL, and the same documents can be indexed in Vector for semantic search. Inference comes from Ignite Models, and everything authenticates with a single token.

The result is faster to ship — there’s no multi-service integration to design and debug; you point a pipeline at a bucket and rows appear in a table. And it’s cheaper to run — one vendor and one bill instead of a half-dozen, no glue to maintain, and no idle services to pay for between batches.

Build it

RAG knowledge base — the end-to-end ingest-and-retrieve mechanics: get documents in, index them, and retrieve by meaning.
K3 Documents → Warehouse recipe — the extraction path: a pipeline that pulls entities and fields out of each uploaded document and writes structured rows into a K3 Tables table you can query with SQL.

Document intelligence

The outcome

What you build on Dodil

Why it’s faster and cheaper here

Build it

See also