Keyword match (Text Match)

Vector search is great at finding “things that feel similar.” But sometimes you also need a strict keyword constraint:

“Find documents like this but must mention refund.”
“Search for invoice and overdue, but exclude paid.”
“Only return results that include at least one of these terms: gpu h100 l40.”

In Dodil VBase, this is done using keyword match (Milvus calls it Text Match). It lets you filter on a text field using a TEXT_MATCH(...) filter expression.

When to use keyword match

Keyword match is useful when you want:

Hard constraints (must include / must not include specific terms)
Hybrid retrieval (vector similarity + keyword filtering)
Fast lookup on large text fields using an inverted index

Common examples:

Support search: “similar tickets, but must contain login or otp.”
E‑commerce: “similar products, but must include wireless and bluetooth.”
Observability: “similar incidents, but must include timeout and exclude resolved.”

Prerequisite: enable match on a text field

TEXT_MATCH only works if the text field is configured to support it.

Your collection schema must:

Use a VARCHAR field for the text.
Enable an analyzer (how text is tokenized).
Enable match on that field.

Here is an example schema in JSON form (advanced), showing the key flags:


{
  "autoId": true,
  "enabledDynamicField": false,
  "fields": [
    { "fieldName": "id", "dataType": "Int64", "isPrimary": true },
    {
      "fieldName": "text",
      "dataType": "VarChar",
      "elementTypeParams": {
        "max_length": 200,
        "enable_analyzer": true,
        "enable_match": true,
        "analyzer_params": { "type": "english" }
      }
    },
    {
      "fieldName": "embeddings",
      "dataType": "FloatVector",
      "elementTypeParams": { "dim": "1536" }
    }
  ]
}

Notes:

The analyzer affects what counts as a “term.” Choose the analyzer that matches your language/content.
Enabling match typically creates an inverted index under the hood, which uses extra storage.
Analyzer settings are effectively fixed per collection—if you change your mind, you usually recreate the collection.

Basic usage: `TEXT_MATCH(field, 'terms')`

The filter syntax is:


TEXT_MATCH(field_name, text)

field_name: the VARCHAR field to search in (for example text).
text: the terms to search for.

OR matching (default)

By default, TEXT_MATCH(text, 'a b') matches documents that contain any of the terms.


filter_expr = "TEXT_MATCH(text, 'machine deep')"  # machine OR deep

AND matching

To require both terms, combine multiple expressions with and:


filter_expr = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'deep')"

Excluding terms

Use not to exclude a term:


filter_expr = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'learning') and not TEXT_MATCH(text, 'deep')"

Use keyword match with vector search

A common pattern is: vector similarity search + keyword filter.


# Build a keyword filter
filter_expr = "TEXT_MATCH(text, 'keyword1 keyword2')"  # keyword1 OR keyword2
 
# Run vector search with a filter
results = vbase.search(
    collection_name="my_collection",
    vector_field="embeddings",
    data=[query_vector],
    filter=filter_expr,
    limit=10,
    output_fields=["id", "text"],
)
 
print(results)

What this does:

Finds the nearest vectors to query_vector
Only keeps hits where text matches the keyword expression

Use keyword match with scalar queries

If you want keyword matching without vector similarity (pure filtering), use a query.


filter_expr = "TEXT_MATCH(text, 'keyword1') and TEXT_MATCH(text, 'keyword2')"
 
rows = vbase.query(
    collection_name="my_collection",
    filter=filter_expr,
    output_fields=["id", "text"],
)
 
print(rows)

Practical recommendations

Prefer keyword match for constraints, not for full ranking. For ranking by relevance, combine it with vector search.
Keep your text field reasonably sized and pick the right analyzer.
If you need strict exact match on whole strings (IDs, SKUs, emails), use a normal VARCHAR filter instead of TEXT_MATCH.

Troubleshooting

`TEXT_MATCH` returns an error

This usually means the field was not configured with:

enable_analyzer: true
enable_match: true

Recreate the collection schema with match enabled.

Results look unexpected

Check your analyzer. Tokenization, stop‑words, stemming, and language settings can change what “matches.”