Template Patterns And Recipes

Last validated: 2026-05-14

This page captures recurring Scriptum authoring patterns from production templates in dodil-scriptum/templates.

Validated template sources used for this page:

templates/core/classification.scriptum
templates/core/ocr_extraction.scriptum
templates/core/translation.scriptum
templates/embedding/text_embedding_index.scriptum
templates/vision/object_detection.scriptum

Pattern 1: Canonical Source Resolution

Most ingestion templates normalize source routing into one reusable branch structure:


decide "Resolve URL" with rules
  when url != "" -> "Public URL"
  when source.provider != "Public" -> "Transfer Private"
  otherwise -> "Public Source"

  "Public URL":
    let resolved_url = url

  "Transfer Private":
    do "Transfer to cache" with transfer_source
      source = source
      object = object
      stream = true
    -> transferred

    let resolved_url = transferred.content.cache_url

  "Public Source":
    let resolved_url = object.value

Why this is used:

isolates private-source transfer concerns
keeps downstream extraction blocks format-agnostic
gives a stable resolved_url contract for all later steps

Pattern 2: Shape Normalization After Branches

Branches that can produce different payload structures often normalize to one shape before downstream calls:


decide "Get text content" with rules
  when raw_text != "" -> "Has Text"
  when contains(content_type, "pdf") -> "PDF"
  otherwise -> "Read Source"

  "Has Text":
    do "Normalize raw" with normalize
      text = raw_text
      operation = "TrimAndLower"
    -> doc_text

  "Read Source":
    do "Read source" with read_source
      source = source
      object = { type: "Url", value: resolved_url }
      format = "String"
    -> raw_data

    do "Normalize fetched" with normalize
      text = raw_data.content
      operation = "TrimAndLower"
    -> doc_text

Guideline: ensure all route branches bind to the same output variable (for example doc_text) so later steps do not need branch-aware logic.

Pattern 3: Enum-Driven Configuration Contracts

Templates use enums to constrain runtime mode selections and model names:


enum OutputDest = "return" | "s3" | "warehouse" | "s3_and_warehouse"
enum DetectModel = "yolov8m"
enum EmbeddingType : text = "float" | "binary" | "float16" | "bfloat16"

Benefits:

keeps input contracts self-documenting
prevents typo-driven mode errors
simplifies API and CLI validation feedback

Pattern 4: Schema-First Writes

For warehouse and vector persistence, templates generally ensure schema before write operations:


do "Ensure Schema" with k3_table_ensure_schema
  bucket = bucket
  table_name = table_name
  org_id = org_id
  columns = { input_ref: { type: "text", nullable: false } }
  merge_keys = merge_keys
-> ensured

do "Write to warehouse" with k3_table_insert
  bucket = bucket
  table_name = table_name
  org_id = org_id
  rows = warehouse_rows
  mode = warehouse_mode
  match_columns = merge_keys

For vector pipelines, this appears as vector_store_ensure_schema before chunk/embed/insert.

Pattern 5: Streaming Page/Frame Pipelines

High-volume templates stream per-item processing via pipe and yield:


pipe page from page_images_res.renders
  do "Run OCR" with infer
    model = ocr_model
    input = { "type": "image_url", "image_url": page.url }
  -> ocr_result

  yield "Page OCR"
    data = { text: ocr_result.text, page_number: page.page_number }
-> pipe_results

Notes:

use ->> upstream when tool responses are stream-like and should feed pipe iteratively
combine yield for observability with terminal emit for final summary

Pattern 6: Confidence-Gated Fallback

Templates commonly implement deterministic quality gates before expensive fallback paths:


decide "Confidence check" with rules
  when paddle_result.average_confidence >= min_ocr_confidence -> "Paddle Sufficient"
  otherwise -> "Vision Fallback"

  "Paddle Sufficient":
    let ocr_result = { text: paddle_result.text }

  "Vision Fallback":
    do "LLM vision fallback" with llm
      prompt = "Extract all text from: {resized.url}"
    -> ocr_result

Guideline: keep fallback criteria explicit and measurable (confidence, length, count) to reduce nondeterministic branching.

Pattern 7: Idempotent vs Append Persistence

A recurring template decision is explicit write mode by intent:

mode = "merge" with stable merge_keys for idempotent reruns
mode = "append" for event/time-series capture (for example per-frame detections)

Document this in script descriptions so operators understand replay behavior.

Pattern 8: Defensive JSON Handling For LLM Outputs

Where LLM output is expected as JSON, templates often normalize before parse:


let parsed = parse_json(trim(replace(replace(classification.text, "```json", ""), "```", "")))

Rationale: fenced Markdown output appears in real responses even with strict prompts.

Authoring Checklist From Templates

Route and normalize inputs first (Resolve URL, type route, normalized shape).
Bind every important intermediate with -> name for debuggability.
Prefer decide ... with rules for deterministic control flow.
Guard expensive fallbacks with measurable checks.
Validate schemas before writes.
Use yield for progressive telemetry in long pipelines.
Make persistence mode (merge/append) explicit in inputs and docs.