Skip to content

Parsers

BaseQueryParser

Bases: CommonParamsMixin

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

DisMaxQueryParser

Bases: BaseQueryParser

DisMax (Disjunction Max) Query Parser for Apache Solr.

The DisMax query parser is designed for user-friendly queries, providing an experience similar to popular search engines like Google. It handles queries gracefully even when they contain errors, making it ideal for end-user facing applications. DisMax distributes terms across multiple fields with individual boosts and combines results using disjunction max scoring.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html

Key Features
  • Simplified query syntax (no need for field names)
  • Error-tolerant parsing
  • Multi-field search with individual field boosts (qf)
  • Phrase boosting for proximity matches (pf)
  • Minimum should match logic (mm)
  • Tie-breaker for scoring across fields
  • Boost queries and functions for result tuning
How DisMax Scoring Works

The "tie" parameter controls how field scores are combined: - tie=0.0 (default): Only the highest scoring field contributes - tie=1.0: All field scores are summed - tie=0.1 (typical): Highest score + 0.1 * sum of other scores

Examples:

>>> # Basic multi-field search
>>> parser = DisMaxQueryParser(
...     query="ipod",
...     query_fields={"name": 2.0, "features": 1.0, "text": 0.5}
... )
>>> # With phrase boosting and minimum match
>>> parser = DisMaxQueryParser(
...     query="belkin ipod",
...     query_fields={"name": 5.0, "text": 2.0},
...     phrase_fields={"name": 10.0, "text": 3.0},
...     phrase_slop=2,
...     min_match="75%"
... )
>>> # With boost queries
>>> parser = DisMaxQueryParser(
...     query="video",
...     query_fields={"features": 20.0, "text": 0.3},
...     boost_queries="cat:electronics^5.0"
... )

Parameters:

Name Type Description Default
query

Main query string (user's search terms)

required
alternate_query

Fallback query if q is not specified

required
query_fields

Fields to search with boosts, e.g., {'title': 2.0, 'body': 1.0}

required
query_slop

Phrase slop for explicit phrase queries in user input

required
min_match

Minimum should match specification (e.g., '75%', '2<-25% 9<-3')

required
phrase_fields

Fields for phrase boosting with boosts

required
phrase_slop

Maximum position distance for phrase queries

required
tie_breaker

Tie-breaker value (0.0 to 1.0) for multi-field scoring

required
boost_queries

Additional queries to boost matching documents (additive)

required
boost_functons

Function queries to boost scores (additive)

required
Note

For multiplicative boosting (more predictable), use ExtendedDisMaxQueryParser with the boost parameter instead of bq/bf.

See Also
  • ExtendedDisMaxQueryParser: Enhanced version with additional features
  • StandardParser: For more precise Lucene syntax queries

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

ExtendedDisMaxQueryParser

Bases: DisMaxQueryParser

Extended DisMax (eDisMax) Query Parser for Apache Solr.

The Extended DisMax (eDisMax) query parser is an improved version of DisMax that handles full Lucene query syntax while maintaining error tolerance. It's the most flexible parser for user-facing search applications, combining the precision of the Standard parser with the user-friendliness of DisMax.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html

Key Enhancements over DisMax
  • Full Lucene query syntax support (field names, boolean operators, wildcards)
  • Automatic mm (minimum match) adjustment when stopwords are removed
  • Lowercase operator support ("and", "or" as operators)
  • Bigram (pf2) and trigram (pf3) phrase boosting
  • Multiplicative boosting via boost parameter
  • Fine-grained control over stopword handling
  • User field restrictions (uf) for security
Advanced Phrase Boosting
  • pf: Standard phrase fields (all query terms)
  • pf2: Bigram phrase fields (word pairs)
  • pf3: Trigram phrase fields (word triplets) Each has independent slop control (ps, ps2, ps3)

Examples:

>>> # Basic eDisMax query with Lucene syntax
>>> parser = ExtendedDisMaxQueryParser(
...     query="title:solr OR (content:search AND type:guide)",
...     query_fields={"title": 2.0, "content": 1.0}
... )
>>> # Multi-level phrase boosting
>>> parser = ExtendedDisMaxQueryParser(
...     query="apache solr search",
...     query_fields={"title": 5.0, "body": 1.0},
...     phrase_fields={"title": 50.0},  # All 3 words
...     phrase_fields_bigram={"title": 20.0},  # 2-word phrases
...     phrase_fields_trigram={"title": 30.0},  # 3-word phrases
...     phrase_slop=2,
...     phrase_slop_bigram=1
... )
>>> # With field aliasing and restrictions
>>> parser = ExtendedDisMaxQueryParser(
...     query="name:Mike sysadmin",
...     query_fields={"title": 1.0, "text": 1.0},
...     user_fields=["title", "text", "last_name", "first_name"]
... )
>>> # With automatic mm relaxation
>>> parser = ExtendedDisMaxQueryParser(
...     query="the quick brown fox",
...     query_fields={"content": 1.0, "title": 2.0},
...     min_match="75%",
...     min_match_auto_relax=True  # Adjust if stopwords removed
... )

Parameters:

Name Type Description Default
split_on_whitespace

If True, analyze each term separately; if False (default), analyze term sequences for multi-word synonyms

required
min_match_auto_relax

Auto-relax mm when stopwords are removed unevenly

required
lowercase_operators

Treat lowercase 'and'/'or' as boolean operators

required
phrase_fields_bigram

Fields for bigram phrase boosting (pf2)

required
phrase_slop_bigram

Slop for bigram phrases (ps2)

required
phrase_fields_trigram

Fields for trigram phrase boosting (pf3)

required
phrase_slop_trigram

Slop for trigram phrases (ps3)

required
stopwords

If False, ignore StopFilterFactory in query analyzer

required
user_fields

Whitelist of fields users can explicitly query (uf parameter)

required
Inherits from DisMaxQueryParser

query, alternate_query, query_fields, min_match, phrase_fields, phrase_slop, tie_breaker, boost_queries, boost_functons

Note

The eDisMax default mm behavior differs from DisMax: - mm=0% if query contains explicit operators (-, +, OR, NOT) or q.op=OR - mm=100% if q.op=AND and query only uses AND operators

See Also
  • DisMaxQueryParser: Simpler version without Lucene syntax support
  • StandardParser: For pure Lucene syntax without DisMax features

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

GeoFilterQueryParser

Bases: SpatialQueryParser

Geospatial Filter Query Parsers (geofilt and bbox) for Apache Solr.

The geofilt and bbox parsers enable location-based filtering in Solr, allowing you to find documents within a certain distance from a point. Geofilt uses a circular radius (precise), while bbox uses a rectangular bounding box (faster but less precise).

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html

Key Features
  • Circular radius search (geofilt) or bounding box search (bbox)
  • Distance-based filtering and scoring
  • Configurable distance units (kilometers, miles, degrees)
  • Cache control for performance optimization
  • Compatible with LatLonPointSpatialField and RPT fields
How Geospatial Filtering Works

geofilt: - Creates a circular search area around a center point - Precise: Only includes documents within exact radius - Slightly slower but more accurate

bbox: - Creates a rectangular bounding box around a center point - Faster: Uses simpler rectangular calculations - May include points outside the circular radius

Distance Scoring

Use the score parameter to return distance as the relevance score: - none: Fixed score of 1.0 (default) - kilometers: Distance in km - miles: Distance in miles - degrees: Distance in degrees

Schema Requirements

Spatial field must be indexed with appropriate field type:

Examples:

>>> # Circular geospatial filter (precise)
>>> parser = GeoFilterQueryParser(
...     spatial_field="store_location",
...     center_point=[45.15, -93.85],
...     distance=5,  # 5 km radius
...     filter_type="geofilt"
... )
>>> # Bounding box filter (faster)
>>> parser = GeoFilterQueryParser(
...     spatial_field="restaurant_coords",
...     center_point=[37.7749, -122.4194],  # San Francisco
...     distance=10,  # 10 km
...     filter_type="bbox"
... )
>>> # With distance scoring
>>> parser = GeoFilterQueryParser(
...     spatial_field="hotel_location",
...     center_point=[51.5074, -0.1278],  # London
...     distance=2,
...     filter_type="geofilt",
...     score="kilometers"  # Return distance as score
... )
>>> # Disable caching for dynamic queries
>>> parser = GeoFilterQueryParser(
...     spatial_field="user_location",
...     center_point=[40.7128, -74.0060],  # NYC
...     distance=1,
...     filter_type="geofilt",
...     cache=False  # Don't cache this filter
... )
>>> # Filter with sorting by distance
>>> # Combine with geodist() function query for sorting
>>> parser = GeoFilterQueryParser(
...     spatial_field="store",
...     center_point=[45.15, -93.85],
...     distance=50
... )
>>> # Add: &sort=geodist() asc to request

Parameters:

Name Type Description Default
filter_type

'geofilt' for circular (precise) or 'bbox' for bounding box (faster)

required
spatial_field

Name of the spatial indexed field (inherited from base, required)

required
center_point

[lat, lon] or [x, y] coordinates of search center (inherited, required)

required
distance

Radial distance from center point (inherited, required)

required
score

Scoring mode (none, kilometers, miles, degrees) (inherited from base)

required
cache

Whether to cache the filter query (inherited from base)

required

Returns:

Type Description

Filter query (fq) matching documents within the specified distance

Performance Tips
  • Use bbox for large radius searches where precision isn't critical
  • Set cache=false for highly variable queries (e.g., user location)
  • Use geofilt for small radius searches requiring precision
  • Consider using docValues for better spatial query performance
See Also
  • BBoxQueryParser: For querying indexed bounding boxes with spatial predicates
  • geodist() function: For distance calculations and sorting
  • Solr Spatial Search Guide: https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html

build(*args, **kwargs)

Build query parameters, excluding mixin keys.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

spatial_params()

Build the spatial search parameters string for use in filter queries.

KNNQueryParser

Bases: DenseVectorSearchQueryParser

K-Nearest Neighbors (KNN) Query Parser for Apache Solr Dense Vector Search.

The KNN query parser enables efficient similarity searches on dense vector fields using the k-nearest neighbors algorithm. It finds the topK documents whose vectors are most similar to the query vector according to the configured similarity function (cosine, dot product, or euclidean).

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Key Features
  • Efficient vector similarity search using HNSW algorithm
  • Configurable k (topK) for number of results
  • Pre-filtering support (explicit or implicit)
  • Re-ranking capability for hybrid search
  • Multiple similarity functions: cosine, dot_product, euclidean
How KNN Search Works
  1. Query vector is compared against indexed vectors
  2. HNSW (Hierarchical Navigable Small World) algorithm efficiently finds neighbors
  3. Top k most similar vectors are returned
  4. Similarity score is used for ranking
Pre-Filtering
  • Implicit: All fq filters (except post filters) automatically pre-filter when knn is main query
  • Explicit: Use preFilter parameter to specify filtering criteria
  • Tagged: Use includeTags/excludeTags to control which fq filters apply
Schema Requirements

Field must be DenseVectorField with matching vector dimension:

Examples:

>>> # Basic KNN search
>>> parser = KNNQueryParser(
...     vector_field="film_vector",
...     vector=[0.1, 0.2, 0.3, 0.4, 0.5],
...     top_k=10
... )
>>> # With explicit pre-filtering
>>> parser = KNNQueryParser(
...     vector_field="product_vector",
...     vector=[1.0, 2.0, 3.0, 4.0],
...     top_k=20,
...     pre_filter=["category:electronics", "inStock:true"]
... )
>>> # With tagged filtering
>>> parser = KNNQueryParser(
...     vector_field="doc_vector",
...     vector=[0.5, 0.5, 0.5, 0.5],
...     top_k=50,
...     include_tags=["for_knn"]
... )
>>> # For re-ranking (use as rq parameter)
>>> parser = KNNQueryParser(
...     vector_field="content_vector",
...     vector=[0.2, 0.3, 0.4, 0.5],
...     top_k=100  # Searches whole index in re-ranking context
... )

Parameters:

Name Type Description Default
vector

Query vector as list of floats (required, must match field dimension)

required
top_k

Number of nearest neighbors to return (default: 10)

required
vector_field

Name of the DenseVectorField to search (inherited from base)

required
pre_filter

Explicit pre-filter query strings (inherited from base)

required
include_tags

Only use fq filters with these tags for implicit pre-filtering (inherited)

required
exclude_tags

Exclude fq filters with these tags from implicit pre-filtering (inherited)

required

Returns:

Type Description

Query results ranked by vector similarity score

Note

When used in re-ranking (rq parameter), topK refers to k-nearest neighbors in the whole index, not just the initial result set.

See Also
  • VectorSimilarityQueryParser: For threshold-based vector search
  • KNNTextToVectorQueryParser: For text-to-vector conversion with KNN search

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

KNNTextToVectorQueryParser

Bases: DenseVectorSearchQueryParser

KNN Text-to-Vector Query Parser for Apache Solr Dense Vector Search.

The knn_text_to_vector parser combines text encoding with k-nearest neighbors search, allowing you to search for similar documents using natural language queries instead of pre-computed vectors. It uses a language model to convert query text into a vector, then performs KNN search on that vector.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Key Features
  • Automatic text-to-vector encoding using language models
  • Eliminates need for pre-computing query vectors
  • Supports various embedding models (OpenAI, Hugging Face, etc.)
  • Combines semantic search with KNN efficiency
  • Configurable k (topK) for number of results
How Text-to-Vector KNN Works
  1. Query text is sent to the configured language model
  2. Model encodes text into a dense vector
  3. KNN search is performed using the generated vector
  4. Top k most similar documents are returned
Model Requirements

The model must be loaded into Solr's text-to-vector model store: - Configure model in schema via REST API - Supported: OpenAI, Hugging Face, Cohere, etc. - Model must produce vectors matching field dimension

Example Model Configuration (OpenAI): { "class": "dev.langchain4j.model.openai.OpenAiEmbeddingModel", "name": "openai-embeddings", "params": { "apiKey": "YOUR_API_KEY", "modelName": "text-embedding-ada-002" } }

Schema Requirements

Field must be DenseVectorField:

Examples:

>>> # Basic text-to-vector KNN search
>>> parser = KNNTextToVectorQueryParser(
...     vector_field="content_vector",
...     text="machine learning algorithms",
...     model="openai-embeddings",
...     top_k=10
... )
>>> # Semantic search with pre-filtering
>>> parser = KNNTextToVectorQueryParser(
...     vector_field="article_embedding",
...     text="neural networks and deep learning",
...     model="huggingface-embedder",
...     top_k=20,
...     pre_filter=["category:AI", "published:[2020 TO *]"]
... )
>>> # Multi-lingual semantic search
>>> parser = KNNTextToVectorQueryParser(
...     vector_field="multilingual_vector",
...     text="apprentissage automatique",  # French
...     model="multilingual-embedder",
...     top_k=15
... )
>>> # With tagged filtering
>>> parser = KNNTextToVectorQueryParser(
...     vector_field="doc_vector",
...     text="search query optimization",
...     model="sentence-transformer",
...     top_k=50,
...     include_tags=["semantic_search"]
... )

Parameters:

Name Type Description Default
text

Natural language query text to encode (required)

required
model

Name of the model in text-to-vector model store (required)

required
top_k

Number of nearest neighbors to return (default: 10)

required
vector_field

Name of the DenseVectorField to search (inherited from base)

required
pre_filter

Explicit pre-filter query strings (inherited from base)

required
include_tags

Only use fq filters with these tags for implicit pre-filtering (inherited)

required
exclude_tags

Exclude fq filters with these tags from implicit pre-filtering (inherited)

required

Returns:

Type Description

Query results ranked by semantic similarity to the input text

Note

The model name must reference an existing model loaded into the /schema/text-to-vector-model-store endpoint.

See Also
  • KNNQueryParser: For search with pre-computed vectors
  • VectorSimilarityQueryParser: For threshold-based vector search
  • Solr Text-to-Vector Models Guide: https://solr.apache.org/guide/solr/latest/query-guide/text-to-vector.html

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

StandardParser

Bases: BaseQueryParser

Standard Query Parser (Lucene syntax) for Apache Solr.

The Standard Query Parser is Solr's default query parser, supporting full Lucene query syntax including field-specific searches, boolean operators, wildcards, proximity searches, range queries, boosting, and fuzzy searches. It offers greater precision but requires more exact syntax.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html

Features
  • Field-specific queries: title:"The Right Way" AND text:go
  • Boolean operators: AND, OR, NOT, +, -
  • Wildcards: te?t, test, esting
  • Proximity searches: "jakarta apache"~10
  • Range queries: [1 TO 5], {A TO Z}
  • Boosting: jakarta^4 apache
  • Fuzzy searches: roam~0.8
  • Grouping with parentheses: (jakarta OR apache) AND website
  • Constant score queries: description:blue^=1.0

Examples:

>>> # Basic field search
>>> parser = StandardParser(query="title:Solr AND content:search")
>>> # Range query
>>> parser = StandardParser(query="price:[10 TO 100]")
>>> # Proximity search
>>> parser = StandardParser(query='"apache solr"~5')
>>> # With default field and operator
>>> parser = StandardParser(
...     query="apache solr",
...     default_field="content",
...     query_operator="AND"
... )

Parameters:

Name Type Description Default
query

Query string using Lucene syntax (required)

required
query_operator

Default operator ("AND" or "OR"). Determines how multiple terms are combined

required
default_field

Default field to search when no field is specified

required
split_on_whitespace

If True, analyze each term separately; if False (default), analyze term sequences together for multi-word synonyms and shingles

required
See Also
  • DisMaxQueryParser: For user-friendly queries with error tolerance
  • ExtendedDisMaxQueryParser: For advanced user queries combining Lucene syntax with DisMax features

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

TermsQueryParser

Bases: BaseQueryParser

Terms Query Parser for Apache Solr.

The Terms Query Parser generates a query from multiple comma-separated values, matching documents where the specified field contains any of the provided terms. It's optimized for efficiently searching for multiple discrete values in a field, particularly useful for filtering by IDs, tags, or categories.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#terms-query-parser

Key Features
  • Efficient multi-value term matching
  • Configurable separator for term parsing
  • Multiple query implementation methods with different performance characteristics
  • Optimized for large numbers of terms
  • Works with both regular and docValues fields
Query Implementation Methods
  • termsFilter (default): Uses BooleanQuery or TermInSetQuery based on term count. Scales well with index size and moderately with number of terms.
  • booleanQuery: Creates a BooleanQuery. Scales well with index size but poorly with many terms.
  • automaton: Uses automaton-based matching. Good for certain use cases.
  • docValuesTermsFilter: For docValues fields. Automatically chooses between per-segment or top-level implementation.
  • docValuesTermsFilterPerSegment: Per-segment docValues filtering.
  • docValuesTermsFilterTopLevel: Top-level docValues filtering.
Performance Considerations
  • Use termsFilter (default) for general cases
  • Use booleanQuery for small term sets with large indices
  • Use docValues methods only on fields with docValues enabled
  • Term count affects which internal implementation is chosen

Examples:

>>> # Basic usage - search for multiple tags (as filter)
>>> parser = TermsQueryParser(
...     field="tags",
...     terms=["software", "apache", "solr", "lucene"]
... )
>>> # With custom query field
>>> parser = TermsQueryParser(
...     field="tags",
...     terms=["python", "java", "rust"],
...     query="status:active"
... )
>>> # Using space separator with category IDs
>>> parser = TermsQueryParser(
...     field="categoryId",
...     terms=["8", "6", "7", "5309"],
...     separator=" ",
...     method="booleanQuery"
... )
>>> # Filtering by product IDs
>>> parser = TermsQueryParser(
...     field="product_id",
...     terms=["P123", "P456", "P789", "P012"]
... )
>>> # Using with docValues field
>>> parser = TermsQueryParser(
...     field="author_id",
...     terms=["author1", "author2", "author3"],
...     method="docValuesTermsFilter"
... )
>>> # Building query params for use with any Solr client
>>> params = parser.build()
>>> # {'q': '*:*', 'fq': '{!terms f=tags}software,apache,solr,lucene'}

Parameters:

Name Type Description Default
field

The field name to search (required)

required
terms

List of terms to match (required)

required
query

Optional main query string (default: ':'). The terms filter is applied as fq.

required
separator

Character(s) to use for joining terms (default: ','). Use ' ' (single space) if you want space-separated terms.

required
method

Query implementation method. Options: - termsFilter (default): Automatic choice between implementations - booleanQuery: Boolean query approach - automaton: Automaton-based matching - docValuesTermsFilter: Auto-select docValues approach - docValuesTermsFilterPerSegment: Per-segment docValues - docValuesTermsFilterTopLevel: Top-level docValues

required

Returns:

Type Description

Documents where the specified field contains any of the provided terms

Note

When using docValues methods, ensure the target field has docValues enabled in the schema. The cache parameter defaults to false for docValues methods.

See Also
  • StandardParser: For Lucene syntax queries with field specifications
  • DisMaxQueryParser: For multi-field user-friendly queries

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

VectorSimilarityQueryParser

Bases: DenseVectorSearchQueryParser

Vector Similarity Query Parser for Apache Solr Dense Vector Search.

The vectorSimilarity parser matches documents whose vector similarity to the query vector exceeds a minimum threshold. Unlike KNN which returns a fixed number of top results, this parser returns all documents meeting the similarity criteria, making it suitable for threshold-based retrieval.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Key Features
  • Threshold-based vector matching (minReturn)
  • Graph traversal control (minTraverse)
  • Pre-filtering support (explicit or implicit)
  • Returns all documents above similarity threshold
  • Useful for minimum quality requirements
How Vector Similarity Works
  1. Query vector is compared against indexed vectors
  2. Documents with similarity >= minReturn are returned
  3. Graph traversal continues for nodes with similarity >= minTraverse
  4. Results are ranked by similarity score
Similarity vs KNN
  • KNN: Returns exactly k results (top k most similar)
  • VectorSimilarity: Returns all results above threshold (0 to unlimited)
  • KNN: Best for "find similar items"
  • VectorSimilarity: Best for "find items similar enough"
Schema Requirements

Field must be DenseVectorField with matching vector dimension:

Examples:

>>> # Basic similarity search with threshold
>>> parser = VectorSimilarityQueryParser(
...     vector_field="product_vector",
...     vector=[1.0, 2.0, 3.0, 4.0],
...     min_return=0.7  # Only return docs with similarity >= 0.7
... )
>>> # With traversal control
>>> parser = VectorSimilarityQueryParser(
...     vector_field="doc_vector",
...     vector=[0.5, 0.5, 0.5, 0.5],
...     min_return=0.8,  # Return threshold
...     min_traverse=0.6  # Continue graph traversal threshold
... )
>>> # With explicit pre-filtering
>>> parser = VectorSimilarityQueryParser(
...     vector_field="content_vector",
...     vector=[0.2, 0.3, 0.4, 0.5],
...     min_return=0.75,
...     pre_filter=["inStock:true", "price:[* TO 100]"]
... )
>>> # As filter query for hybrid search
>>> # Use with q=*:* to get all docs above similarity threshold
>>> parser = VectorSimilarityQueryParser(
...     vector_field="embedding",
...     vector=[1.5, 2.5, 3.5, 4.5],
...     min_return=0.85
... )

Parameters:

Name Type Description Default
vector

Query vector as list of floats (required, must match field dimension)

required
min_return

Minimum similarity threshold for returned documents (required)

required
min_traverse

Minimum similarity to continue graph traversal (default: -Infinity)

required
vector_field

Name of the DenseVectorField to search (inherited from base)

required
pre_filter

Explicit pre-filter query strings (inherited from base)

required
include_tags

Only use fq filters with these tags for implicit pre-filtering (inherited)

required
exclude_tags

Exclude fq filters with these tags from implicit pre-filtering (inherited)

required

Returns:

Type Description

All documents with vector similarity >= minReturn, ranked by similarity score

Note

Setting minTraverse lower than minReturn allows exploring more of the graph to find potential matches, at the cost of more computation.

See Also
  • KNNQueryParser: For top-k nearest neighbor retrieval
  • KNNTextToVectorQueryParser: For text-based vector similarity search

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default
queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None
fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None
prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None
contains Optional[str]

Limits facet terms to those containing the given substring.

None
contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None
matches Optional[str]

Only returns facets matching this regular expression.

None
sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None
limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None
offset Optional[int]

Offset into the facet list for paging.

None
mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None
missing Optional[bool]

If True, include count of results with no facet value.

None
method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None
enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None
exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None
exclude_terms Optional[str]

Terms to remove from facet counts.

None
overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None
overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None
threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None
range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None
range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None
range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None
range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None
range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None
range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None
range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None
range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None
pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None
pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description
Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default
by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None
func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None
query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None
limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1
offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None
sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None
format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'
main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None
ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False
truncate bool

If True, base facet counts on one doc per group only.

False
facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False
cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description
Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default
method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None
fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None
query Optional[str]

Custom query to use for highlighting (overrides main query).

None
query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None
require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None
query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None
use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None
multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None
snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None
fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None
encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None
max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None
tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None
tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None
# Unified Highlighter specific most accurate, recommended
required
offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None
frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None
fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None
tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None
default_summary Optional[bool]

If True, return leading text when no matches found.

None
score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None
score_b Optional[float]

BM25 length normalization. Default: 0.75.

None
score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None
bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None
bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None
bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None
bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None
bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None
weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None
# Original Highlighter specific legacy
required
merge_contiguous Optional[bool]

Merge adjacent fragments.

None
max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None
max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None
alternate_field Optional[str]

Backup field for summary when no highlights found.

None
max_alternate_field_length Optional[int]

Max length of alternate field.

None
alternate Optional[bool]

Highlight alternate field.

None
formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None
simple_pre Optional[str]

Text before term (simple formatter).

None
simple_post Optional[str]

Text after term (simple formatter).

None
fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None
regex_slop Optional[float]

Deviation factor for regex fragmenter.

None
regex_pattern Optional[str]

Pattern for regex fragmenter.

None
regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None
preserve_multi Optional[bool]

Preserve order in multivalued fields.

None
payloads Optional[bool]

Include payloads in highlighting.

None
# FastVector Highlighter specific requires term vectors
required
frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None
fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None
boundary_scanner Optional[str]

Boundary scanner implementation.

None
phrase_limit Optional[int]

Max phrases to analyze for scoring.

None
multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description
Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight(
...     fields=["title"],
...     tag_before='<mark class="highlight">',
...     tag_after='</mark>',
...     encoder="html"
... )

Unified highlighter with sentence breaks:

>>> parser.highlight(
...     fields=["content"],
...     method="unified",
...     bs_type="SENTENCE",
...     fragment_size=200
... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default
fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None
min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None
min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None
max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None
max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None
min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None
max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None
max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None
max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None
boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None
query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None
interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None
match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None
match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description
Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     min_term_freq=2,
...     min_doc_freq=5,
...     max_query_terms=25
... )

Advanced with filtering:

>>> parser.more_like_this(
...     fields=["content"],
...     min_term_freq=1,
...     min_doc_freq=3,
...     min_word_len=4,
...     max_doc_freq_pct=80,
...     interesting_terms="details"
... )

Boosted fields:

>>> parser.more_like_this(
...     fields=["title", "content"],
...     query_fields="title^2.0 content^1.0",
...     boost=True
... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.