Parsers
BaseQueryParser
Bases: CommonParamsMixin
build(*args, **kwargs)
Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
DisMaxQueryParser
Bases: BaseQueryParser
DisMax (Disjunction Max) Query Parser for Apache Solr.
The DisMax query parser is designed for user-friendly queries, providing an experience similar to popular search engines like Google. It handles queries gracefully even when they contain errors, making it ideal for end-user facing applications. DisMax distributes terms across multiple fields with individual boosts and combines results using disjunction max scoring.
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html
Key Features
- Simplified query syntax (no need for field names)
- Error-tolerant parsing
- Multi-field search with individual field boosts (qf)
- Phrase boosting for proximity matches (pf)
- Minimum should match logic (mm)
- Tie-breaker for scoring across fields
- Boost queries and functions for result tuning
How DisMax Scoring Works
The "tie" parameter controls how field scores are combined: - tie=0.0 (default): Only the highest scoring field contributes - tie=1.0: All field scores are summed - tie=0.1 (typical): Highest score + 0.1 * sum of other scores
Examples:
>>> # Basic multi-field search
>>> parser = DisMaxQueryParser(
... query="ipod",
... query_fields={"name": 2.0, "features": 1.0, "text": 0.5}
... )
>>> # With phrase boosting and minimum match
>>> parser = DisMaxQueryParser(
... query="belkin ipod",
... query_fields={"name": 5.0, "text": 2.0},
... phrase_fields={"name": 10.0, "text": 3.0},
... phrase_slop=2,
... min_match="75%"
... )
>>> # With boost queries
>>> parser = DisMaxQueryParser(
... query="video",
... query_fields={"features": 20.0, "text": 0.3},
... boost_queries="cat:electronics^5.0"
... )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Main query string (user's search terms) |
required | |
alternate_query
|
Fallback query if q is not specified |
required | |
query_fields
|
Fields to search with boosts, e.g., {'title': 2.0, 'body': 1.0} |
required | |
query_slop
|
Phrase slop for explicit phrase queries in user input |
required | |
min_match
|
Minimum should match specification (e.g., '75%', '2<-25% 9<-3') |
required | |
phrase_fields
|
Fields for phrase boosting with boosts |
required | |
phrase_slop
|
Maximum position distance for phrase queries |
required | |
tie_breaker
|
Tie-breaker value (0.0 to 1.0) for multi-field scoring |
required | |
boost_queries
|
Additional queries to boost matching documents (additive) |
required | |
boost_functons
|
Function queries to boost scores (additive) |
required |
Note
For multiplicative boosting (more predictable), use ExtendedDisMaxQueryParser with the boost parameter instead of bq/bf.
See Also
- ExtendedDisMaxQueryParser: Enhanced version with additional features
- StandardParser: For more precise Lucene syntax queries
build(*args, **kwargs)
Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
ExtendedDisMaxQueryParser
Bases: DisMaxQueryParser
Extended DisMax (eDisMax) Query Parser for Apache Solr.
The Extended DisMax (eDisMax) query parser is an improved version of DisMax that handles full Lucene query syntax while maintaining error tolerance. It's the most flexible parser for user-facing search applications, combining the precision of the Standard parser with the user-friendliness of DisMax.
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html
Key Enhancements over DisMax
- Full Lucene query syntax support (field names, boolean operators, wildcards)
- Automatic mm (minimum match) adjustment when stopwords are removed
- Lowercase operator support ("and", "or" as operators)
- Bigram (pf2) and trigram (pf3) phrase boosting
- Multiplicative boosting via boost parameter
- Fine-grained control over stopword handling
- User field restrictions (uf) for security
Advanced Phrase Boosting
- pf: Standard phrase fields (all query terms)
- pf2: Bigram phrase fields (word pairs)
- pf3: Trigram phrase fields (word triplets) Each has independent slop control (ps, ps2, ps3)
Examples:
>>> # Basic eDisMax query with Lucene syntax
>>> parser = ExtendedDisMaxQueryParser(
... query="title:solr OR (content:search AND type:guide)",
... query_fields={"title": 2.0, "content": 1.0}
... )
>>> # Multi-level phrase boosting
>>> parser = ExtendedDisMaxQueryParser(
... query="apache solr search",
... query_fields={"title": 5.0, "body": 1.0},
... phrase_fields={"title": 50.0}, # All 3 words
... phrase_fields_bigram={"title": 20.0}, # 2-word phrases
... phrase_fields_trigram={"title": 30.0}, # 3-word phrases
... phrase_slop=2,
... phrase_slop_bigram=1
... )
>>> # With field aliasing and restrictions
>>> parser = ExtendedDisMaxQueryParser(
... query="name:Mike sysadmin",
... query_fields={"title": 1.0, "text": 1.0},
... user_fields=["title", "text", "last_name", "first_name"]
... )
>>> # With automatic mm relaxation
>>> parser = ExtendedDisMaxQueryParser(
... query="the quick brown fox",
... query_fields={"content": 1.0, "title": 2.0},
... min_match="75%",
... min_match_auto_relax=True # Adjust if stopwords removed
... )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split_on_whitespace
|
If True, analyze each term separately; if False (default), analyze term sequences for multi-word synonyms |
required | |
min_match_auto_relax
|
Auto-relax mm when stopwords are removed unevenly |
required | |
lowercase_operators
|
Treat lowercase 'and'/'or' as boolean operators |
required | |
phrase_fields_bigram
|
Fields for bigram phrase boosting (pf2) |
required | |
phrase_slop_bigram
|
Slop for bigram phrases (ps2) |
required | |
phrase_fields_trigram
|
Fields for trigram phrase boosting (pf3) |
required | |
phrase_slop_trigram
|
Slop for trigram phrases (ps3) |
required | |
stopwords
|
If False, ignore StopFilterFactory in query analyzer |
required | |
user_fields
|
Whitelist of fields users can explicitly query (uf parameter) |
required |
Inherits from DisMaxQueryParser
query, alternate_query, query_fields, min_match, phrase_fields, phrase_slop, tie_breaker, boost_queries, boost_functons
Note
The eDisMax default mm behavior differs from DisMax: - mm=0% if query contains explicit operators (-, +, OR, NOT) or q.op=OR - mm=100% if q.op=AND and query only uses AND operators
See Also
- DisMaxQueryParser: Simpler version without Lucene syntax support
- StandardParser: For pure Lucene syntax without DisMax features
build(*args, **kwargs)
Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
GeoFilterQueryParser
Bases: SpatialQueryParser
Geospatial Filter Query Parsers (geofilt and bbox) for Apache Solr.
The geofilt and bbox parsers enable location-based filtering in Solr, allowing you to find documents within a certain distance from a point. Geofilt uses a circular radius (precise), while bbox uses a rectangular bounding box (faster but less precise).
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html
Key Features
- Circular radius search (geofilt) or bounding box search (bbox)
- Distance-based filtering and scoring
- Configurable distance units (kilometers, miles, degrees)
- Cache control for performance optimization
- Compatible with LatLonPointSpatialField and RPT fields
How Geospatial Filtering Works
geofilt: - Creates a circular search area around a center point - Precise: Only includes documents within exact radius - Slightly slower but more accurate
bbox: - Creates a rectangular bounding box around a center point - Faster: Uses simpler rectangular calculations - May include points outside the circular radius
Distance Scoring
Use the score parameter to return distance as the relevance score: - none: Fixed score of 1.0 (default) - kilometers: Distance in km - miles: Distance in miles - degrees: Distance in degrees
Schema Requirements
Spatial field must be indexed with appropriate field type:
Examples:
>>> # Circular geospatial filter (precise)
>>> parser = GeoFilterQueryParser(
... spatial_field="store_location",
... center_point=[45.15, -93.85],
... distance=5, # 5 km radius
... filter_type="geofilt"
... )
>>> # Bounding box filter (faster)
>>> parser = GeoFilterQueryParser(
... spatial_field="restaurant_coords",
... center_point=[37.7749, -122.4194], # San Francisco
... distance=10, # 10 km
... filter_type="bbox"
... )
>>> # With distance scoring
>>> parser = GeoFilterQueryParser(
... spatial_field="hotel_location",
... center_point=[51.5074, -0.1278], # London
... distance=2,
... filter_type="geofilt",
... score="kilometers" # Return distance as score
... )
>>> # Disable caching for dynamic queries
>>> parser = GeoFilterQueryParser(
... spatial_field="user_location",
... center_point=[40.7128, -74.0060], # NYC
... distance=1,
... filter_type="geofilt",
... cache=False # Don't cache this filter
... )
>>> # Filter with sorting by distance
>>> # Combine with geodist() function query for sorting
>>> parser = GeoFilterQueryParser(
... spatial_field="store",
... center_point=[45.15, -93.85],
... distance=50
... )
>>> # Add: &sort=geodist() asc to request
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filter_type
|
'geofilt' for circular (precise) or 'bbox' for bounding box (faster) |
required | |
spatial_field
|
Name of the spatial indexed field (inherited from base, required) |
required | |
center_point
|
[lat, lon] or [x, y] coordinates of search center (inherited, required) |
required | |
distance
|
Radial distance from center point (inherited, required) |
required | |
score
|
Scoring mode (none, kilometers, miles, degrees) (inherited from base) |
required | |
cache
|
Whether to cache the filter query (inherited from base) |
required |
Returns:
| Type | Description |
|---|---|
|
Filter query (fq) matching documents within the specified distance |
Performance Tips
- Use bbox for large radius searches where precision isn't critical
- Set cache=false for highly variable queries (e.g., user location)
- Use geofilt for small radius searches requiring precision
- Consider using docValues for better spatial query performance
See Also
- BBoxQueryParser: For querying indexed bounding boxes with spatial predicates
- geodist() function: For distance calculations and sorting
- Solr Spatial Search Guide: https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html
build(*args, **kwargs)
Build query parameters, excluding mixin keys.
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
spatial_params()
Build the spatial search parameters string for use in filter queries.
KNNQueryParser
Bases: DenseVectorSearchQueryParser
K-Nearest Neighbors (KNN) Query Parser for Apache Solr Dense Vector Search.
The KNN query parser enables efficient similarity searches on dense vector fields using the k-nearest neighbors algorithm. It finds the topK documents whose vectors are most similar to the query vector according to the configured similarity function (cosine, dot product, or euclidean).
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
Key Features
- Efficient vector similarity search using HNSW algorithm
- Configurable k (topK) for number of results
- Pre-filtering support (explicit or implicit)
- Re-ranking capability for hybrid search
- Multiple similarity functions: cosine, dot_product, euclidean
How KNN Search Works
- Query vector is compared against indexed vectors
- HNSW (Hierarchical Navigable Small World) algorithm efficiently finds neighbors
- Top k most similar vectors are returned
- Similarity score is used for ranking
Pre-Filtering
- Implicit: All fq filters (except post filters) automatically pre-filter when knn is main query
- Explicit: Use preFilter parameter to specify filtering criteria
- Tagged: Use includeTags/excludeTags to control which fq filters apply
Schema Requirements
Field must be DenseVectorField with matching vector dimension:
Examples:
>>> # Basic KNN search
>>> parser = KNNQueryParser(
... vector_field="film_vector",
... vector=[0.1, 0.2, 0.3, 0.4, 0.5],
... top_k=10
... )
>>> # With explicit pre-filtering
>>> parser = KNNQueryParser(
... vector_field="product_vector",
... vector=[1.0, 2.0, 3.0, 4.0],
... top_k=20,
... pre_filter=["category:electronics", "inStock:true"]
... )
>>> # With tagged filtering
>>> parser = KNNQueryParser(
... vector_field="doc_vector",
... vector=[0.5, 0.5, 0.5, 0.5],
... top_k=50,
... include_tags=["for_knn"]
... )
>>> # For re-ranking (use as rq parameter)
>>> parser = KNNQueryParser(
... vector_field="content_vector",
... vector=[0.2, 0.3, 0.4, 0.5],
... top_k=100 # Searches whole index in re-ranking context
... )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
Query vector as list of floats (required, must match field dimension) |
required | |
top_k
|
Number of nearest neighbors to return (default: 10) |
required | |
vector_field
|
Name of the DenseVectorField to search (inherited from base) |
required | |
pre_filter
|
Explicit pre-filter query strings (inherited from base) |
required | |
include_tags
|
Only use fq filters with these tags for implicit pre-filtering (inherited) |
required | |
exclude_tags
|
Exclude fq filters with these tags from implicit pre-filtering (inherited) |
required |
Returns:
| Type | Description |
|---|---|
|
Query results ranked by vector similarity score |
Note
When used in re-ranking (rq parameter), topK refers to k-nearest neighbors in the whole index, not just the initial result set.
See Also
- VectorSimilarityQueryParser: For threshold-based vector search
- KNNTextToVectorQueryParser: For text-to-vector conversion with KNN search
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
KNNTextToVectorQueryParser
Bases: DenseVectorSearchQueryParser
KNN Text-to-Vector Query Parser for Apache Solr Dense Vector Search.
The knn_text_to_vector parser combines text encoding with k-nearest neighbors search, allowing you to search for similar documents using natural language queries instead of pre-computed vectors. It uses a language model to convert query text into a vector, then performs KNN search on that vector.
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
Key Features
- Automatic text-to-vector encoding using language models
- Eliminates need for pre-computing query vectors
- Supports various embedding models (OpenAI, Hugging Face, etc.)
- Combines semantic search with KNN efficiency
- Configurable k (topK) for number of results
How Text-to-Vector KNN Works
- Query text is sent to the configured language model
- Model encodes text into a dense vector
- KNN search is performed using the generated vector
- Top k most similar documents are returned
Model Requirements
The model must be loaded into Solr's text-to-vector model store: - Configure model in schema via REST API - Supported: OpenAI, Hugging Face, Cohere, etc. - Model must produce vectors matching field dimension
Example Model Configuration (OpenAI): { "class": "dev.langchain4j.model.openai.OpenAiEmbeddingModel", "name": "openai-embeddings", "params": { "apiKey": "YOUR_API_KEY", "modelName": "text-embedding-ada-002" } }
Schema Requirements
Field must be DenseVectorField:
Examples:
>>> # Basic text-to-vector KNN search
>>> parser = KNNTextToVectorQueryParser(
... vector_field="content_vector",
... text="machine learning algorithms",
... model="openai-embeddings",
... top_k=10
... )
>>> # Semantic search with pre-filtering
>>> parser = KNNTextToVectorQueryParser(
... vector_field="article_embedding",
... text="neural networks and deep learning",
... model="huggingface-embedder",
... top_k=20,
... pre_filter=["category:AI", "published:[2020 TO *]"]
... )
>>> # Multi-lingual semantic search
>>> parser = KNNTextToVectorQueryParser(
... vector_field="multilingual_vector",
... text="apprentissage automatique", # French
... model="multilingual-embedder",
... top_k=15
... )
>>> # With tagged filtering
>>> parser = KNNTextToVectorQueryParser(
... vector_field="doc_vector",
... text="search query optimization",
... model="sentence-transformer",
... top_k=50,
... include_tags=["semantic_search"]
... )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
Natural language query text to encode (required) |
required | |
model
|
Name of the model in text-to-vector model store (required) |
required | |
top_k
|
Number of nearest neighbors to return (default: 10) |
required | |
vector_field
|
Name of the DenseVectorField to search (inherited from base) |
required | |
pre_filter
|
Explicit pre-filter query strings (inherited from base) |
required | |
include_tags
|
Only use fq filters with these tags for implicit pre-filtering (inherited) |
required | |
exclude_tags
|
Exclude fq filters with these tags from implicit pre-filtering (inherited) |
required |
Returns:
| Type | Description |
|---|---|
|
Query results ranked by semantic similarity to the input text |
Note
The model name must reference an existing model loaded into the /schema/text-to-vector-model-store endpoint.
See Also
- KNNQueryParser: For search with pre-computed vectors
- VectorSimilarityQueryParser: For threshold-based vector search
- Solr Text-to-Vector Models Guide: https://solr.apache.org/guide/solr/latest/query-guide/text-to-vector.html
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
StandardParser
Bases: BaseQueryParser
Standard Query Parser (Lucene syntax) for Apache Solr.
The Standard Query Parser is Solr's default query parser, supporting full Lucene query syntax including field-specific searches, boolean operators, wildcards, proximity searches, range queries, boosting, and fuzzy searches. It offers greater precision but requires more exact syntax.
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html
Features
- Field-specific queries: title:"The Right Way" AND text:go
- Boolean operators: AND, OR, NOT, +, -
- Wildcards: te?t, test, esting
- Proximity searches: "jakarta apache"~10
- Range queries: [1 TO 5], {A TO Z}
- Boosting: jakarta^4 apache
- Fuzzy searches: roam~0.8
- Grouping with parentheses: (jakarta OR apache) AND website
- Constant score queries: description:blue^=1.0
Examples:
>>> # Basic field search
>>> parser = StandardParser(query="title:Solr AND content:search")
>>> # Range query
>>> parser = StandardParser(query="price:[10 TO 100]")
>>> # Proximity search
>>> parser = StandardParser(query='"apache solr"~5')
>>> # With default field and operator
>>> parser = StandardParser(
... query="apache solr",
... default_field="content",
... query_operator="AND"
... )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Query string using Lucene syntax (required) |
required | |
query_operator
|
Default operator ("AND" or "OR"). Determines how multiple terms are combined |
required | |
default_field
|
Default field to search when no field is specified |
required | |
split_on_whitespace
|
If True, analyze each term separately; if False (default), analyze term sequences together for multi-word synonyms and shingles |
required |
See Also
- DisMaxQueryParser: For user-friendly queries with error tolerance
- ExtendedDisMaxQueryParser: For advanced user queries combining Lucene syntax with DisMax features
build(*args, **kwargs)
Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
TermsQueryParser
Bases: BaseQueryParser
Terms Query Parser for Apache Solr.
The Terms Query Parser generates a query from multiple comma-separated values, matching documents where the specified field contains any of the provided terms. It's optimized for efficiently searching for multiple discrete values in a field, particularly useful for filtering by IDs, tags, or categories.
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#terms-query-parser
Key Features
- Efficient multi-value term matching
- Configurable separator for term parsing
- Multiple query implementation methods with different performance characteristics
- Optimized for large numbers of terms
- Works with both regular and docValues fields
Query Implementation Methods
- termsFilter (default): Uses BooleanQuery or TermInSetQuery based on term count. Scales well with index size and moderately with number of terms.
- booleanQuery: Creates a BooleanQuery. Scales well with index size but poorly with many terms.
- automaton: Uses automaton-based matching. Good for certain use cases.
- docValuesTermsFilter: For docValues fields. Automatically chooses between per-segment or top-level implementation.
- docValuesTermsFilterPerSegment: Per-segment docValues filtering.
- docValuesTermsFilterTopLevel: Top-level docValues filtering.
Performance Considerations
- Use termsFilter (default) for general cases
- Use booleanQuery for small term sets with large indices
- Use docValues methods only on fields with docValues enabled
- Term count affects which internal implementation is chosen
Examples:
>>> # Basic usage - search for multiple tags (as filter)
>>> parser = TermsQueryParser(
... field="tags",
... terms=["software", "apache", "solr", "lucene"]
... )
>>> # With custom query field
>>> parser = TermsQueryParser(
... field="tags",
... terms=["python", "java", "rust"],
... query="status:active"
... )
>>> # Using space separator with category IDs
>>> parser = TermsQueryParser(
... field="categoryId",
... terms=["8", "6", "7", "5309"],
... separator=" ",
... method="booleanQuery"
... )
>>> # Filtering by product IDs
>>> parser = TermsQueryParser(
... field="product_id",
... terms=["P123", "P456", "P789", "P012"]
... )
>>> # Using with docValues field
>>> parser = TermsQueryParser(
... field="author_id",
... terms=["author1", "author2", "author3"],
... method="docValuesTermsFilter"
... )
>>> # Building query params for use with any Solr client
>>> params = parser.build()
>>> # {'q': '*:*', 'fq': '{!terms f=tags}software,apache,solr,lucene'}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
The field name to search (required) |
required | |
terms
|
List of terms to match (required) |
required | |
query
|
Optional main query string (default: ':'). The terms filter is applied as fq. |
required | |
separator
|
Character(s) to use for joining terms (default: ','). Use ' ' (single space) if you want space-separated terms. |
required | |
method
|
Query implementation method. Options: - termsFilter (default): Automatic choice between implementations - booleanQuery: Boolean query approach - automaton: Automaton-based matching - docValuesTermsFilter: Auto-select docValues approach - docValuesTermsFilterPerSegment: Per-segment docValues - docValuesTermsFilterTopLevel: Top-level docValues |
required |
Returns:
| Type | Description |
|---|---|
|
Documents where the specified field contains any of the provided terms |
Note
When using docValues methods, ensure the target field has docValues enabled in the schema. The cache parameter defaults to false for docValues methods.
See Also
- StandardParser: For Lucene syntax queries with field specifications
- DisMaxQueryParser: For multi-field user-friendly queries
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.
VectorSimilarityQueryParser
Bases: DenseVectorSearchQueryParser
Vector Similarity Query Parser for Apache Solr Dense Vector Search.
The vectorSimilarity parser matches documents whose vector similarity to the query vector exceeds a minimum threshold. Unlike KNN which returns a fixed number of top results, this parser returns all documents meeting the similarity criteria, making it suitable for threshold-based retrieval.
Solr Reference
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
Key Features
- Threshold-based vector matching (minReturn)
- Graph traversal control (minTraverse)
- Pre-filtering support (explicit or implicit)
- Returns all documents above similarity threshold
- Useful for minimum quality requirements
How Vector Similarity Works
- Query vector is compared against indexed vectors
- Documents with similarity >= minReturn are returned
- Graph traversal continues for nodes with similarity >= minTraverse
- Results are ranked by similarity score
Similarity vs KNN
- KNN: Returns exactly k results (top k most similar)
- VectorSimilarity: Returns all results above threshold (0 to unlimited)
- KNN: Best for "find similar items"
- VectorSimilarity: Best for "find items similar enough"
Schema Requirements
Field must be DenseVectorField with matching vector dimension:
Examples:
>>> # Basic similarity search with threshold
>>> parser = VectorSimilarityQueryParser(
... vector_field="product_vector",
... vector=[1.0, 2.0, 3.0, 4.0],
... min_return=0.7 # Only return docs with similarity >= 0.7
... )
>>> # With traversal control
>>> parser = VectorSimilarityQueryParser(
... vector_field="doc_vector",
... vector=[0.5, 0.5, 0.5, 0.5],
... min_return=0.8, # Return threshold
... min_traverse=0.6 # Continue graph traversal threshold
... )
>>> # With explicit pre-filtering
>>> parser = VectorSimilarityQueryParser(
... vector_field="content_vector",
... vector=[0.2, 0.3, 0.4, 0.5],
... min_return=0.75,
... pre_filter=["inStock:true", "price:[* TO 100]"]
... )
>>> # As filter query for hybrid search
>>> # Use with q=*:* to get all docs above similarity threshold
>>> parser = VectorSimilarityQueryParser(
... vector_field="embedding",
... vector=[1.5, 2.5, 3.5, 4.5],
... min_return=0.85
... )
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
Query vector as list of floats (required, must match field dimension) |
required | |
min_return
|
Minimum similarity threshold for returned documents (required) |
required | |
min_traverse
|
Minimum similarity to continue graph traversal (default: -Infinity) |
required | |
vector_field
|
Name of the DenseVectorField to search (inherited from base) |
required | |
pre_filter
|
Explicit pre-filter query strings (inherited from base) |
required | |
include_tags
|
Only use fq filters with these tags for implicit pre-filtering (inherited) |
required | |
exclude_tags
|
Exclude fq filters with these tags from implicit pre-filtering (inherited) |
required |
Returns:
| Type | Description |
|---|---|
|
All documents with vector similarity >= minReturn, ranked by similarity score |
Note
Setting minTraverse lower than minReturn allows exploring more of the graph to find potential matches, at the cost of more computation.
See Also
- KNNQueryParser: For top-k nearest neighbor retrieval
- KNNTextToVectorQueryParser: For text-based vector similarity search
facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)
Enable faceting to categorize and count search results.
Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
Optional[List[str]]
|
Arbitrary queries to generate facet counts for specific terms/expressions. |
None
|
fields
|
Optional[List[str]]
|
Fields to be treated as facets. Common for categories, brands, tags. |
None
|
prefix
|
Optional[str]
|
Limits facet terms to those starting with the given prefix. |
None
|
contains
|
Optional[str]
|
Limits facet terms to those containing the given substring. |
None
|
contains_ignore_case
|
Optional[bool]
|
If True, ignores case when matching the 'contains' parameter. |
None
|
matches
|
Optional[str]
|
Only returns facets matching this regular expression. |
None
|
sort
|
Optional[Literal['count', 'index']]
|
Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically). |
None
|
limit
|
Optional[int]
|
Number of facet counts to return. Set to -1 for all. |
None
|
offset
|
Optional[int]
|
Offset into the facet list for paging. |
None
|
mincount
|
Optional[int]
|
Minimum count for facets to be included in response. Common to set to 1 to hide empty facets. |
None
|
missing
|
Optional[bool]
|
If True, include count of results with no facet value. |
None
|
method
|
Optional[Literal['enum', 'fc', 'fcs']]
|
Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates). |
None
|
enum_cache_min_df
|
Optional[int]
|
Minimum document frequency for filterCache usage with enum method. |
None
|
exists
|
Optional[bool]
|
Cap facet counts by 1 (only for non-trie fields). |
None
|
exclude_terms
|
Optional[str]
|
Terms to remove from facet counts. |
None
|
overrequest_count
|
Optional[int]
|
Extra facets to request from each shard for better accuracy in distributed environments. |
None
|
overrequest_ratio
|
Optional[float]
|
Ratio for overrequesting facets from shards. |
None
|
threads
|
Optional[int]
|
Number of threads for parallel facet loading. Useful for multiple facets on large datasets. |
None
|
range_field
|
Optional[List[str]]
|
Fields for range faceting (e.g., price ranges, date ranges). |
None
|
range_start
|
Optional[Dict[str, str]]
|
Lower bound of ranges per field. Dict mapping field name to start value. |
None
|
range_end
|
Optional[Dict[str, str]]
|
Upper bound of ranges per field. Dict mapping field name to end value. |
None
|
range_gap
|
Optional[Dict[str, str]]
|
Size of each range span per field. E.g., {'price': '100'} for $100 increments. |
None
|
range_hardend
|
Optional[bool]
|
If True, uses exact range_end as upper bound even if it doesn't align with gap. |
None
|
range_include
|
Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]
|
Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'. |
None
|
range_other
|
Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]
|
Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'. |
None
|
range_method
|
Optional[Literal['filter', 'dv']]
|
Method to use for range faceting. Use 'filter' or 'dv' (for docValues). |
None
|
pivot_fields
|
Optional[List[str]]
|
Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand']. |
None
|
pivot_mincount
|
Optional[int]
|
Minimum count for pivot facet inclusion. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with facet configuration applied. |
Examples:
Basic field faceting:
>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)
Range faceting for prices:
>>> parser.facet(
... range_field=["price"],
... range_start={"price": "0"},
... range_end={"price": "1000"},
... range_gap={"price": "100"}
... )
Filtered facets:
>>> parser.facet(fields=["color"], prefix="bl", mincount=5)
group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)
Enable result grouping to collapse results by common field values.
Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by
|
Optional[Union[str, List[str]]]
|
Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings. |
None
|
func
|
Optional[str]
|
Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches. |
None
|
query
|
Optional[Union[str, List[str]]]
|
Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups. |
None
|
limit
|
int
|
Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group. |
1
|
offset
|
Optional[int]
|
Skip the first N documents within each group. Useful for pagination within groups. |
None
|
sort
|
Optional[str]
|
How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter. |
None
|
format
|
str
|
Response structure format. 'grouped' (nested, default) or 'simple' (flat list). |
'grouped'
|
main
|
Optional[bool]
|
If True, returns first field grouping as main result list, flattening the response. |
None
|
ngroups
|
bool
|
If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud. |
False
|
truncate
|
bool
|
If True, base facet counts on one doc per group only. |
False
|
facet
|
bool
|
If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud. |
False
|
cache_percent
|
int
|
Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance. |
0
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with group configuration applied. |
Examples:
Group by author:
>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)
Group by price range:
>>> parser.group(
... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
... limit=5
... )
Multiple field groupings:
>>> parser.group(by=["author", "category"], limit=2)
highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)
Enable highlighting to show snippets with query terms emphasized.
Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
Optional[Literal['unified', 'original', 'fastVector']]
|
Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors). |
None
|
fields
|
Optional[Union[str, List[str]]]
|
Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended). |
None
|
query
|
Optional[str]
|
Custom query to use for highlighting (overrides main query). |
None
|
query_parser
|
Optional[str]
|
Query parser for the highlight query (e.g., 'edismax', 'lucene'). |
None
|
require_field_match
|
Optional[bool]
|
If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields. |
None
|
query_field_pattern
|
Optional[str]
|
Regular expression pattern for fields to consider for highlighting. |
None
|
use_phrase_highlighter
|
Optional[bool]
|
If True, highlights complete phrases accurately. Default: True. |
None
|
multiterm
|
Optional[bool]
|
Enable highlighting for wildcard, fuzzy, and range queries. Default: True. |
None
|
snippets_per_field
|
Optional[int]
|
Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages. |
None
|
fragment_size
|
Optional[int]
|
Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields). |
None
|
encoder
|
Optional[Literal['', 'html']]
|
Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS. |
None
|
max_analyzed_chars
|
Optional[int]
|
Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance. |
None
|
tag_before
|
Optional[str]
|
Text/tag to insert before each highlighted term. Default: ''. Example: ''. |
None
|
tag_after
|
Optional[str]
|
Text/tag to insert after each highlighted term. Default: ''. |
None
|
# Unified Highlighter specific
|
most accurate, recommended
|
|
required |
offset_source
|
Optional[str]
|
How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected. |
None
|
frag_align_ratio
|
Optional[float]
|
Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match). |
None
|
fragsize_is_minimum
|
Optional[bool]
|
If True, treat fragment_size as minimum. Default: True. |
None
|
tag_ellipsis
|
Optional[str]
|
Text between multiple snippets (e.g., '...' or ' [...] '). |
None
|
default_summary
|
Optional[bool]
|
If True, return leading text when no matches found. |
None
|
score_k1
|
Optional[float]
|
BM25 term frequency normalization. Default: 1.2. |
None
|
score_b
|
Optional[float]
|
BM25 length normalization. Default: 0.75. |
None
|
score_pivot
|
Optional[int]
|
BM25 average passage length in characters. Default: 87. |
None
|
bs_language
|
Optional[str]
|
BreakIterator language for text segmentation (e.g., 'en', 'ja'). |
None
|
bs_country
|
Optional[str]
|
BreakIterator country code (e.g., 'US', 'GB'). |
None
|
bs_variant
|
Optional[str]
|
BreakIterator variant for specialized locale rules. |
None
|
bs_type
|
Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]
|
How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'. |
None
|
bs_separator
|
Optional[str]
|
Custom separator character when bs_type=SEPARATOR. |
None
|
weight_matches
|
Optional[bool]
|
Use Lucene's Weight Matches API for most accurate highlighting. |
None
|
# Original Highlighter specific
|
legacy
|
|
required |
merge_contiguous
|
Optional[bool]
|
Merge adjacent fragments. |
None
|
max_multivalued_to_examine
|
Optional[int]
|
Max entries to examine in multivalued field. |
None
|
max_multivalued_to_match
|
Optional[int]
|
Max matches in multivalued field. |
None
|
alternate_field
|
Optional[str]
|
Backup field for summary when no highlights found. |
None
|
max_alternate_field_length
|
Optional[int]
|
Max length of alternate field. |
None
|
alternate
|
Optional[bool]
|
Highlight alternate field. |
None
|
formatter
|
Optional[Literal['simple']]
|
Formatter for highlighted output. Use 'simple'. |
None
|
simple_pre
|
Optional[str]
|
Text before term (simple formatter). |
None
|
simple_post
|
Optional[str]
|
Text after term (simple formatter). |
None
|
fragmenter
|
Optional[Literal['gap', 'regex']]
|
Text snippet generator type. Use 'gap' or 'regex'. |
None
|
regex_slop
|
Optional[float]
|
Deviation factor for regex fragmenter. |
None
|
regex_pattern
|
Optional[str]
|
Pattern for regex fragmenter. |
None
|
regex_max_analyzed_chars
|
Optional[int]
|
Char limit for regex fragmenter. |
None
|
preserve_multi
|
Optional[bool]
|
Preserve order in multivalued fields. |
None
|
payloads
|
Optional[bool]
|
Include payloads in highlighting. |
None
|
# FastVector Highlighter specific
|
requires term vectors
|
|
required |
frag_list_builder
|
Optional[Literal['simple', 'weighted', 'single']]
|
Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'. |
None
|
fragments_builder
|
Optional[Literal['default', 'colored']]
|
Fragment formatting implementation. Use 'default' or 'colored'. |
None
|
boundary_scanner
|
Optional[str]
|
Boundary scanner implementation. |
None
|
phrase_limit
|
Optional[int]
|
Max phrases to analyze for scoring. |
None
|
multivalue_separator
|
Optional[str]
|
Separator for multivalued fields. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with highlight configuration applied. |
Examples:
Basic highlighting:
>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)
Custom HTML tags:
>>> parser.highlight(
... fields=["title"],
... tag_before='<mark class="highlight">',
... tag_after='</mark>',
... encoder="html"
... )
Unified highlighter with sentence breaks:
>>> parser.highlight(
... fields=["content"],
... method="unified",
... bs_type="SENTENCE",
... fragment_size=200
... )
more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)
Enable MoreLikeThis to find documents similar to a given document.
MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
Optional[Union[str, List[str]]]
|
Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields. |
None
|
min_term_freq
|
Optional[int]
|
Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise. |
None
|
min_doc_freq
|
Optional[int]
|
Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents. |
None
|
max_doc_freq
|
Optional[int]
|
Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both. |
None
|
max_doc_freq_pct
|
Optional[int]
|
Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering. |
None
|
min_word_len
|
Optional[int]
|
Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'. |
None
|
max_word_len
|
Optional[int]
|
Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs. |
None
|
max_query_terms
|
Optional[int]
|
Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed. |
None
|
max_num_tokens_parsed
|
Optional[int]
|
Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents. |
None
|
boost
|
Optional[bool]
|
If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents. |
None
|
query_fields
|
Optional[str]
|
Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching. |
None
|
interesting_terms
|
Optional[str]
|
Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected. |
None
|
match_include
|
Optional[bool]
|
If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component. |
None
|
match_offset
|
Optional[int]
|
When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new parser instance with more_like_this configuration applied. |
Examples:
Basic similarity search:
>>> parser.more_like_this(
... fields=["title", "content"],
... min_term_freq=2,
... min_doc_freq=5,
... max_query_terms=25
... )
Advanced with filtering:
>>> parser.more_like_this(
... fields=["content"],
... min_term_freq=1,
... min_doc_freq=3,
... min_word_len=4,
... max_doc_freq_pct=80,
... interesting_terms="details"
... )
Boosted fields:
>>> parser.more_like_this(
... fields=["title", "content"],
... query_fields="title^2.0 content^1.0",
... boost=True
... )
serialize_configs(params)
Serialize ParamsConfig objects as top level params.