Skip to Content
We are live but in Staging 🎉

title: Text Highlighter description: Highlight matched terms in text fields so users can instantly see why a record appeared in search results.

Text Highlighter

When you run keyword or hybrid searches, it’s common to ask: “Why did this record match?”

The Text Highlighter answers that by returning a version of your text field with matched terms wrapped in tags you choose (for example, <mark>...</mark> or {...}). This makes search results easier to read, helps debugging relevance, and enables rich UI rendering in search and RAG apps.

Highlighting is a post-processing step on the final result set. It does not change filtering, scoring, ranking, or which documents match.

VBase supports text highlighting (powered by Milvus under the hood) along three independent controls:

  1. Which terms to highlight
    • The terms used by keyword/BM25 search.
    • Additional query terms coming from text filters (for example TEXT_MATCH).
  2. How highlighted terms are rendered
    • Configure the tag(s) inserted before and after each match.
  3. How highlighted text is returned
    • Return the highlighted content as full text, or as fragments (snippets) around matches.

Prerequisites

  • Python 3.10+
  • A collection with a text field (e.g. text) and a search method that supports keyword/BM25 or hybrid retrieval.
  • Include the text field in output_fields so VBase can produce highlights for it.

Basic usage

Below is a minimal pattern: create a highlighter config and pass it to your search call.

from dodil import Client from dodil.vbase import VBaseConfig, TextHighlighter # Authorize c = Client( service_account_id="...", service_account_secret="...", ) # Connect vbase = c.vbase.connect( VBaseConfig( host="vbase-db-<id>.infra.dodil.cloud", port=443, scheme="https", db_name="db_<id>", ) ) highlighter = TextHighlighter( pre_tags=["{"], post_tags=["}"], highlight_search_text=True, ) results = vbase.search( collection_name="docs", data=["BM25"], anns_field="sparse_vector", limit=5, output_fields=["text"], highlighter=highlighter, ) for hit in results[0]: # `highlight` is a dedicated field that contains highlighted output per text field. print(hit.get("highlight", {}).get("text", []))

What you get back

When highlighting is enabled, each hit includes a highlight field. The value is typically:

  • a mapping of text field name → list of highlighted fragments

Example (single fragment):

['Milvus supports full text search. Use {BM25} for keyword relevance. ...']

If you’re running keyword/BM25 search, you can highlight the exact terms used by the search query.

Set:

  • highlight_search_text=True

This tells VBase to use the search text itself as the highlight term source.

Highlight terms from text filters (TEXT_MATCH)

Sometimes the user doesn’t search for a keyword directly, but you still want to highlight what matched inside a text filter.

For example, if your filter expression includes TEXT_MATCH(text, "my doc"), you can add those filter terms to the highlighter.

highlighter = TextHighlighter( pre_tags=["{"], post_tags=["}"], highlight_search_text=True, highlight_query=[ {"type": "TextMatch", "field": "text", "text": "my doc"}, ], ) results = vbase.search( collection_name="docs", data=["test"], anns_field="sparse_vector", limit=5, output_fields=["text"], filter='TEXT_MATCH(text, "my doc")', highlighter=highlighter, )

Output example:

['{my} first {test} {doc}']

Return highlights as fragments (snippets)

For long text fields, returning the full highlighted text can be noisy. Fragments let you return short snippets around the match.

Use:

  • fragment_offset: keep up to N characters before the first highlighted span
  • fragment_size: maximum approximate length per fragment
  • num_of_fragments: maximum number of fragments returned per text value
highlighter = TextHighlighter( pre_tags=["{"], post_tags=["}"], highlight_search_text=True, fragment_offset=20, fragment_size=60, num_of_fragments=3, ) results = vbase.search( collection_name="docs", data=["Milvus"], anns_field="sparse_vector", limit=3, output_fields=["text"], highlighter=highlighter, ) for i, hit in enumerate(results[0], start=1): print(f"Doc {i}:", hit.get("highlight", {}).get("text", []))

Example fragment output:

Doc 1: ['... my first test doc. {Milvus} is an open-source vector database ...']

Multi-query highlighting

If you pass multiple query strings (for example data=["test", "Milvus"]), each query’s results are highlighted independently using the same highlighter configuration.

Use HTML-safe tags

If you render results in a web UI, you can use HTML tags like <mark>.

highlighter = TextHighlighter( pre_tags=["<mark>"], post_tags=["</mark>"], highlight_search_text=True, )

Notes and limitations

  • Highlighting requires the text field to be included in output_fields.
  • Highlighting is applied after retrieval and ranking, so it won’t change relevance.
  • Support depends on the search mode you use (keyword/BM25, hybrid, and text filters such as TEXT_MATCH).

If you want to build a UI feature (like a search results page with highlighted snippets), this is usually the simplest and most user-friendly way to explain matches.

Last updated on