Skip to Content
We are live but in Staging 🎉
DSL ReferenceTemplate Patterns And Recipes

Template Patterns And Recipes

Last validated: 2026-05-14

This page captures recurring Scriptum authoring patterns from production templates in dodil-scriptum/templates.

Validated template sources used for this page:

  • templates/core/classification.scriptum
  • templates/core/ocr_extraction.scriptum
  • templates/core/translation.scriptum
  • templates/embedding/text_embedding_index.scriptum
  • templates/vision/object_detection.scriptum

Pattern 1: Canonical Source Resolution

Most ingestion templates normalize source routing into one reusable branch structure:

decide "Resolve URL" with rules when url != "" -> "Public URL" when source.provider != "Public" -> "Transfer Private" otherwise -> "Public Source" "Public URL": let resolved_url = url "Transfer Private": do "Transfer to cache" with transfer_source source = source object = object stream = true -> transferred let resolved_url = transferred.content.cache_url "Public Source": let resolved_url = object.value

Why this is used:

  • isolates private-source transfer concerns
  • keeps downstream extraction blocks format-agnostic
  • gives a stable resolved_url contract for all later steps

Pattern 2: Shape Normalization After Branches

Branches that can produce different payload structures often normalize to one shape before downstream calls:

decide "Get text content" with rules when raw_text != "" -> "Has Text" when contains(content_type, "pdf") -> "PDF" otherwise -> "Read Source" "Has Text": do "Normalize raw" with normalize text = raw_text operation = "TrimAndLower" -> doc_text "Read Source": do "Read source" with read_source source = source object = { type: "Url", value: resolved_url } format = "String" -> raw_data do "Normalize fetched" with normalize text = raw_data.content operation = "TrimAndLower" -> doc_text

Guideline: ensure all route branches bind to the same output variable (for example doc_text) so later steps do not need branch-aware logic.

Pattern 3: Enum-Driven Configuration Contracts

Templates use enums to constrain runtime mode selections and model names:

enum OutputDest = "return" | "s3" | "warehouse" | "s3_and_warehouse" enum DetectModel = "yolov8m" enum EmbeddingType : text = "float" | "binary" | "float16" | "bfloat16"

Benefits:

  • keeps input contracts self-documenting
  • prevents typo-driven mode errors
  • simplifies API and CLI validation feedback

Pattern 4: Schema-First Writes

For warehouse and vector persistence, templates generally ensure schema before write operations:

do "Ensure Schema" with k3_table_ensure_schema bucket = bucket table_name = table_name org_id = org_id columns = { input_ref: { type: "text", nullable: false } } merge_keys = merge_keys -> ensured do "Write to warehouse" with k3_table_insert bucket = bucket table_name = table_name org_id = org_id rows = warehouse_rows mode = warehouse_mode match_columns = merge_keys

For vector pipelines, this appears as vector_store_ensure_schema before chunk/embed/insert.

Pattern 5: Streaming Page/Frame Pipelines

High-volume templates stream per-item processing via pipe and yield:

pipe page from page_images_res.renders do "Run OCR" with infer model = ocr_model input = { "type": "image_url", "image_url": page.url } -> ocr_result yield "Page OCR" data = { text: ocr_result.text, page_number: page.page_number } -> pipe_results

Notes:

  • use ->> upstream when tool responses are stream-like and should feed pipe iteratively
  • combine yield for observability with terminal emit for final summary

Pattern 6: Confidence-Gated Fallback

Templates commonly implement deterministic quality gates before expensive fallback paths:

decide "Confidence check" with rules when paddle_result.average_confidence >= min_ocr_confidence -> "Paddle Sufficient" otherwise -> "Vision Fallback" "Paddle Sufficient": let ocr_result = { text: paddle_result.text } "Vision Fallback": do "LLM vision fallback" with llm prompt = "Extract all text from: {resized.url}" -> ocr_result

Guideline: keep fallback criteria explicit and measurable (confidence, length, count) to reduce nondeterministic branching.

Pattern 7: Idempotent vs Append Persistence

A recurring template decision is explicit write mode by intent:

  • mode = "merge" with stable merge_keys for idempotent reruns
  • mode = "append" for event/time-series capture (for example per-frame detections)

Document this in script descriptions so operators understand replay behavior.

Pattern 8: Defensive JSON Handling For LLM Outputs

Where LLM output is expected as JSON, templates often normalize before parse:

let parsed = parse_json(trim(replace(replace(classification.text, "```json", ""), "```", "")))

Rationale: fenced Markdown output appears in real responses even with strict prompts.

Authoring Checklist From Templates

  1. Route and normalize inputs first (Resolve URL, type route, normalized shape).
  2. Bind every important intermediate with -> name for debuggability.
  3. Prefer decide ... with rules for deterministic control flow.
  4. Guard expensive fallbacks with measurable checks.
  5. Validate schemas before writes.
  6. Use yield for progressive telemetry in long pipelines.
  7. Make persistence mode (merge/append) explicit in inputs and docs.