Task

When you submit an Ingest job, you can optionally choose a Task in embed_spec.task.

A Task tells the system what the embedding will be used for. Different tasks may produce embeddings optimized for different use-cases (search, similarity, code search, etc.).

If you don’t provide a task, we default to a safe choice for building a searchable knowledge base (typically Index).

Python snippets below assume you already connected to VNG (see Connect to VNG) and you have a vng client instance. We also import EmbedTask from the SDK for structured task selection.

Why Task matters

Think of embeddings like “coordinates” for meaning. The Task helps the model place your content in the best possible space for your goal:

Index tasks: optimize embeddings for storing and retrieving content later (building a knowledge base).
Query tasks: optimize embeddings for searching against indexed content.
Code tasks: optimize embeddings for code semantics (functions, classes, APIs).
Similarity tasks: optimize embeddings for comparing two texts directly.

Task types

Index (`EMBED_TASK_INDEX`)

What it’s for: building a knowledge base / search index.

Resembles: “I want to store this content so I can search it later.”

Typical examples:

Upload PDFs, docs, support articles, policies, tickets, and store embeddings in your vector database.
Ingest long content and chunk it, so each chunk becomes searchable.

Use when: you are embedding documents/content that will be retrieved later.

Python example:


from dodil import EmbedTask
result = vng.embed(
    inputs=[
        "Veriflow onboarding checklist...",
        "Password reset steps...",
    ],
    task=EmbedTask.INDEX,
)
 
print(len(result), len(result[0]))

Query (`EMBED_TASK_QUERY`)

What it’s for: searching a knowledge base built with Index.

Resembles: “This is the user’s search question.”

Typical examples:

A user types: “How do I reset my API key?”
You embed that query with Query and search against vectors created with Index.

Use when: you are embedding a search query (short text) to retrieve relevant indexed chunks.

Python example:


from dodil import EmbedTask
query_vec = vng.embed(
    inputs=["How do I reset my API key?"],
    task=EmbedTask.QUERY,
)[0]
 
# Send `query_vec` to VBase similarity search against your Index vectors.

Code Index (`EMBED_TASK_CODE_INDEX`)

What it’s for: building a searchable index of code.

Resembles: “Store this code so I can search across repositories later.”

Typical examples:

Embed a repository’s source files for semantic code search.
Embed generated API docs, SDK code, or internal libraries.

Use when: your inputs are primarily code (not natural language documents).

Python example:


from dodil import EmbedTask
code_vecs = vng.embed(
    inputs=[
        "def verify_jwt(token): ...",
        "class GatewayAuthz: ...",
    ],
    task=EmbedTask.CODE_INDEX,
)

Code Query (`EMBED_TASK_CODE_QUERY`)

What it’s for: searching across indexed code.

Resembles: “This is the developer’s question about code.”

Typical examples:

“Where do we validate JWT scopes in the gateway?”
“Find the function that sets x-organization-id header.”

Use when: you are embedding a code search query to retrieve code chunks embedded with Code Index.

Python example:


from dodil import EmbedTask
code_query_vec = vng.embed(
    inputs=["Where do we validate JWT scopes in the gateway?"],
    task=EmbedTask.CODE_QUERY,
)[0]

Text Similarity (`EMBED_TASK_TEXT_SIMILARITY`)

What it’s for: comparing two texts directly (semantic similarity).

Resembles: “How similar are these two pieces of text?”

Typical examples:

Deduplication: detect near-duplicate customer tickets.
Clustering: group similar documents.
Matching: compare a candidate profile to a job description.

Use when: you are computing similarity / clustering / dedup, not retrieval search.

Python example:


from dodil import EmbedTask
a, b = vng.embed(
    inputs=[
        "Customer cannot login after password reset",
        "User is locked out after changing credentials",
    ],
    task=EmbedTask.TEXT_SIMILARITY,
)
 
# Send `a` and `b` to VBase (or compute similarity client-side).

Recommended pairing

For best results, pair tasks consistently:

Knowledge base search: Index (documents) + Query (user questions)
Code search: Code Index (repo/codebase) + Code Query (developer questions)
Similarity workloads: Text Similarity (both sides)

Notes

Task selection influences how the embedding space is shaped. Using Query vectors against Index vectors is a common and recommended pattern for semantic retrieval.
If you’re not sure, start with Index for content ingestion and Query for search.

Task

Why Task matters

Task types

Index (EMBED_TASK_INDEX)

Query (EMBED_TASK_QUERY)

Code Index (EMBED_TASK_CODE_INDEX)

Code Query (EMBED_TASK_CODE_QUERY)

Text Similarity (EMBED_TASK_TEXT_SIMILARITY)

Recommended pairing

Notes

Index (`EMBED_TASK_INDEX`)

Query (`EMBED_TASK_QUERY`)

Code Index (`EMBED_TASK_CODE_INDEX`)

Code Query (`EMBED_TASK_CODE_QUERY`)

Text Similarity (`EMBED_TASK_TEXT_SIMILARITY`)