Keyword match (Text Match)
Vector search is great at finding “things that feel similar.” But sometimes you also need a strict keyword constraint:
- “Find documents like this but must mention
refund.” - “Search for
invoiceandoverdue, but excludepaid.” - “Only return results that include at least one of these terms:
gpuh100l40.”
In Dodil VBase, this is done using keyword match (Milvus calls it Text Match). It lets you filter on a text field using a TEXT_MATCH(...) filter expression.
When to use keyword match
Keyword match is useful when you want:
- Hard constraints (must include / must not include specific terms)
- Hybrid retrieval (vector similarity + keyword filtering)
- Fast lookup on large text fields using an inverted index
Common examples:
- Support search: “similar tickets, but must contain
loginorotp.” - E‑commerce: “similar products, but must include
wirelessandbluetooth.” - Observability: “similar incidents, but must include
timeoutand excluderesolved.”
Prerequisite: enable match on a text field
TEXT_MATCH only works if the text field is configured to support it.
Your collection schema must:
- Use a VARCHAR field for the text.
- Enable an analyzer (how text is tokenized).
- Enable match on that field.
Here is an example schema in JSON form (advanced), showing the key flags:
{
"autoId": true,
"enabledDynamicField": false,
"fields": [
{ "fieldName": "id", "dataType": "Int64", "isPrimary": true },
{
"fieldName": "text",
"dataType": "VarChar",
"elementTypeParams": {
"max_length": 200,
"enable_analyzer": true,
"enable_match": true,
"analyzer_params": { "type": "english" }
}
},
{
"fieldName": "embeddings",
"dataType": "FloatVector",
"elementTypeParams": { "dim": "1536" }
}
]
}Notes:
- The analyzer affects what counts as a “term.” Choose the analyzer that matches your language/content.
- Enabling match typically creates an inverted index under the hood, which uses extra storage.
- Analyzer settings are effectively fixed per collection—if you change your mind, you usually recreate the collection.
Basic usage: TEXT_MATCH(field, 'terms')
The filter syntax is:
TEXT_MATCH(field_name, text)field_name: the VARCHAR field to search in (for exampletext).text: the terms to search for.
OR matching (default)
By default, TEXT_MATCH(text, 'a b') matches documents that contain any of the terms.
filter_expr = "TEXT_MATCH(text, 'machine deep')" # machine OR deepAND matching
To require both terms, combine multiple expressions with and:
filter_expr = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'deep')"Excluding terms
Use not to exclude a term:
filter_expr = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'learning') and not TEXT_MATCH(text, 'deep')"Use keyword match with vector search
A common pattern is: vector similarity search + keyword filter.
# Build a keyword filter
filter_expr = "TEXT_MATCH(text, 'keyword1 keyword2')" # keyword1 OR keyword2
# Run vector search with a filter
results = vbase.search(
collection_name="my_collection",
vector_field="embeddings",
data=[query_vector],
filter=filter_expr,
limit=10,
output_fields=["id", "text"],
)
print(results)What this does:
- Finds the nearest vectors to
query_vector - Only keeps hits where
textmatches the keyword expression
Use keyword match with scalar queries
If you want keyword matching without vector similarity (pure filtering), use a query.
filter_expr = "TEXT_MATCH(text, 'keyword1') and TEXT_MATCH(text, 'keyword2')"
rows = vbase.query(
collection_name="my_collection",
filter=filter_expr,
output_fields=["id", "text"],
)
print(rows)Practical recommendations
- Prefer keyword match for constraints, not for full ranking. For ranking by relevance, combine it with vector search.
- Keep your text field reasonably sized and pick the right analyzer.
- If you need strict exact match on whole strings (IDs, SKUs, emails), use a normal VARCHAR filter instead of
TEXT_MATCH.
Troubleshooting
TEXT_MATCH returns an error
This usually means the field was not configured with:
enable_analyzer: trueenable_match: true
Recreate the collection schema with match enabled.
Results look unexpected
Check your analyzer. Tokenization, stop‑words, stemming, and language settings can change what “matches.”