Customer-support assistant
The outcome
An assistant that answers questions from your own help docs, knowledge base, and past tickets — not a generic model guessing. Every answer is grounded in your content (retrieval-augmented), so it’s accurate and traceable, and it stays current: when a doc or ticket changes, the index updates and the next answer reflects it.
Point it at customers for self-serve deflection — fewer tickets reach a human — and at agents as a copilot that surfaces the right passage and a drafted reply in seconds, for faster resolution. Same content, same pipeline, two audiences.
What you build on Dodil
The architecture is the standard RAG loop: documents land in object storage, a pipeline chunks and embeds them on upload, questions are embedded and matched against the vector index, and a chat model writes a grounded answer from the retrieved passages. On Dodil each role maps to one product:
- Object storage + auto-embedding pipelines + vector search — K3 : drop docs and tickets in a bucket, and K3 chunks, embeds, and indexes them for you. Want to own the schema and index instead? Drive a managed Milvus database directly with VBase .
- Embeddings + chat — Ignite Models : OpenAI/Cohere-compatible inference for embedding both your content and incoming questions, and for generating the final answer.
- Serving the endpoint (optional) — Ignite Compute : run the answer endpoint serverless, next to your data, with no servers to manage.
Why it’s faster and cheaper here
Without Dodil you’d assemble and operate: a vector DB vendor, an embeddings/LLM API, ingestion glue that re-chunks and re-embeds on every doc change, object storage, compute to serve it, and auth wired across all of them. Each is a separate account, integration, and bill — and the ingestion glue is code you own forever, since support content changes constantly.
On Dodil that collapses to one platform, one IAM token, pipelines that embed on upload automatically, and serverless compute. The pieces are built to fit, so there’s nothing to integrate between them.
- Faster — ship in days, not weeks. Less code to write (no ingestion or embedding glue), and one auth instead of credentials wired across four to six services.
- Cheaper — fewer vendors and fewer bills, no glue to maintain, no idle compute to pay for, and far less ops surface to monitor and keep in sync.
These wins are structural — they come from removing vendors, glue, and idle capacity, not from a pricing trick.
Build it
Follow the step-by-step build in RAG knowledge base. It covers both paths end to end: a CLI path where K3 handles ingest, chunking, and embedding for you, and a programmatic path (Python) that composes Ignite Models and VBase yourself for full control over schema, retrieval, and prompt.
See also
- Use Cases — what teams build on Dodil, and why it ships faster and runs cheaper
- RAG knowledge base — the step-by-step build behind this use case
- K3 — object storage, auto-embedding pipelines, and managed vector search
- Ignite Models — OpenAI/Cohere-compatible embeddings and chat
- Ignite Compute — serverless compute to serve the endpoint
- VBase — managed Milvus when you want to own the schema and index
- Get an Access Token — the single IAM token used across all of it