Template Patterns And Recipes
Last validated: 2026-05-14
This page captures recurring Scriptum authoring patterns from production templates in dodil-scriptum/templates.
Validated template sources used for this page:
templates/core/classification.scriptumtemplates/core/ocr_extraction.scriptumtemplates/core/translation.scriptumtemplates/embedding/text_embedding_index.scriptumtemplates/vision/object_detection.scriptum
Pattern 1: Canonical Source Resolution
Most ingestion templates normalize source routing into one reusable branch structure:
decide "Resolve URL" with rules
when url != "" -> "Public URL"
when source.provider != "Public" -> "Transfer Private"
otherwise -> "Public Source"
"Public URL":
let resolved_url = url
"Transfer Private":
do "Transfer to cache" with transfer_source
source = source
object = object
stream = true
-> transferred
let resolved_url = transferred.content.cache_url
"Public Source":
let resolved_url = object.valueWhy this is used:
- isolates private-source transfer concerns
- keeps downstream extraction blocks format-agnostic
- gives a stable
resolved_urlcontract for all later steps
Pattern 2: Shape Normalization After Branches
Branches that can produce different payload structures often normalize to one shape before downstream calls:
decide "Get text content" with rules
when raw_text != "" -> "Has Text"
when contains(content_type, "pdf") -> "PDF"
otherwise -> "Read Source"
"Has Text":
do "Normalize raw" with normalize
text = raw_text
operation = "TrimAndLower"
-> doc_text
"Read Source":
do "Read source" with read_source
source = source
object = { type: "Url", value: resolved_url }
format = "String"
-> raw_data
do "Normalize fetched" with normalize
text = raw_data.content
operation = "TrimAndLower"
-> doc_textGuideline: ensure all route branches bind to the same output variable (for example doc_text) so later steps do not need branch-aware logic.
Pattern 3: Enum-Driven Configuration Contracts
Templates use enums to constrain runtime mode selections and model names:
enum OutputDest = "return" | "s3" | "warehouse" | "s3_and_warehouse"
enum DetectModel = "yolov8m"
enum EmbeddingType : text = "float" | "binary" | "float16" | "bfloat16"Benefits:
- keeps input contracts self-documenting
- prevents typo-driven mode errors
- simplifies API and CLI validation feedback
Pattern 4: Schema-First Writes
For warehouse and vector persistence, templates generally ensure schema before write operations:
do "Ensure Schema" with k3_table_ensure_schema
bucket = bucket
table_name = table_name
org_id = org_id
columns = { input_ref: { type: "text", nullable: false } }
merge_keys = merge_keys
-> ensured
do "Write to warehouse" with k3_table_insert
bucket = bucket
table_name = table_name
org_id = org_id
rows = warehouse_rows
mode = warehouse_mode
match_columns = merge_keysFor vector pipelines, this appears as vector_store_ensure_schema before chunk/embed/insert.
Pattern 5: Streaming Page/Frame Pipelines
High-volume templates stream per-item processing via pipe and yield:
pipe page from page_images_res.renders
do "Run OCR" with infer
model = ocr_model
input = { "type": "image_url", "image_url": page.url }
-> ocr_result
yield "Page OCR"
data = { text: ocr_result.text, page_number: page.page_number }
-> pipe_resultsNotes:
- use
->>upstream when tool responses are stream-like and should feed pipe iteratively - combine
yieldfor observability with terminalemitfor final summary
Pattern 6: Confidence-Gated Fallback
Templates commonly implement deterministic quality gates before expensive fallback paths:
decide "Confidence check" with rules
when paddle_result.average_confidence >= min_ocr_confidence -> "Paddle Sufficient"
otherwise -> "Vision Fallback"
"Paddle Sufficient":
let ocr_result = { text: paddle_result.text }
"Vision Fallback":
do "LLM vision fallback" with llm
prompt = "Extract all text from: {resized.url}"
-> ocr_resultGuideline: keep fallback criteria explicit and measurable (confidence, length, count) to reduce nondeterministic branching.
Pattern 7: Idempotent vs Append Persistence
A recurring template decision is explicit write mode by intent:
mode = "merge"with stablemerge_keysfor idempotent rerunsmode = "append"for event/time-series capture (for example per-frame detections)
Document this in script descriptions so operators understand replay behavior.
Pattern 8: Defensive JSON Handling For LLM Outputs
Where LLM output is expected as JSON, templates often normalize before parse:
let parsed = parse_json(trim(replace(replace(classification.text, "```json", ""), "```", "")))Rationale: fenced Markdown output appears in real responses even with strict prompts.
Authoring Checklist From Templates
- Route and normalize inputs first (
Resolve URL, type route, normalized shape). - Bind every important intermediate with
-> namefor debuggability. - Prefer
decide ... with rulesfor deterministic control flow. - Guard expensive fallbacks with measurable checks.
- Validate schemas before writes.
- Use
yieldfor progressive telemetry in long pipelines. - Make persistence mode (
merge/append) explicit in inputs and docs.