Parsers

`BaseQueryParser`

Bases: CommonParamsMixin

`build(*args, **kwargs)`

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

`facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)`

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name	Type	Description	Default
`queries`	`Optional[List[str]]`	Arbitrary queries to generate facet counts for specific terms/expressions.	`None`
`fields`	`Optional[List[str]]`	Fields to be treated as facets. Common for categories, brands, tags.	`None`
`prefix`	`Optional[str]`	Limits facet terms to those starting with the given prefix.	`None`
`contains`	`Optional[str]`	Limits facet terms to those containing the given substring.	`None`
`contains_ignore_case`	`Optional[bool]`	If True, ignores case when matching the 'contains' parameter.	`None`
`matches`	`Optional[str]`	Only returns facets matching this regular expression.	`None`
`sort`	`Optional[Literal['count', 'index']]`	Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).	`None`
`limit`	`Optional[int]`	Number of facet counts to return. Set to -1 for all.	`None`
`offset`	`Optional[int]`	Offset into the facet list for paging.	`None`
`mincount`	`Optional[int]`	Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.	`None`
`missing`	`Optional[bool]`	If True, include count of results with no facet value.	`None`
`method`	`Optional[Literal['enum', 'fc', 'fcs']]`	Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).	`None`
`enum_cache_min_df`	`Optional[int]`	Minimum document frequency for filterCache usage with enum method.	`None`
`exists`	`Optional[bool]`	Cap facet counts by 1 (only for non-trie fields).	`None`
`exclude_terms`	`Optional[str]`	Terms to remove from facet counts.	`None`
`overrequest_count`	`Optional[int]`	Extra facets to request from each shard for better accuracy in distributed environments.	`None`
`overrequest_ratio`	`Optional[float]`	Ratio for overrequesting facets from shards.	`None`
`threads`	`Optional[int]`	Number of threads for parallel facet loading. Useful for multiple facets on large datasets.	`None`
`range_field`	`Optional[List[str]]`	Fields for range faceting (e.g., price ranges, date ranges).	`None`
`range_start`	`Optional[Dict[str, str]]`	Lower bound of ranges per field. Dict mapping field name to start value.	`None`
`range_end`	`Optional[Dict[str, str]]`	Upper bound of ranges per field. Dict mapping field name to end value.	`None`
`range_gap`	`Optional[Dict[str, str]]`	Size of each range span per field. E.g., {'price': '100'} for $100 increments.	`None`
`range_hardend`	`Optional[bool]`	If True, uses exact range_end as upper bound even if it doesn't align with gap.	`None`
`range_include`	`Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]`	Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.	`None`
`range_other`	`Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]`	Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.	`None`
`range_method`	`Optional[Literal['filter', 'dv']]`	Method to use for range faceting. Use 'filter' or 'dv' (for docValues).	`None`
`pivot_fields`	`Optional[List[str]]`	Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].	`None`
`pivot_mincount`	`Optional[int]`	Minimum count for pivot facet inclusion.	`None`

Returns:

Type	Description
`Self`	A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet(
...     range_field=["price"],
...     range_start={"price": "0"},
...     range_end={"price": "1000"},
...     range_gap={"price": "100"}
... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

`group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)`

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name	Type	Description	Default
`by`	`Optional[Union[str, List[str]]]`	Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.	`None`
`func`	`Optional[str]`	Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.	`None`
`query`	`Optional[Union[str, List[str]]]`	Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.	`None`
`limit`	`int`	Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.	`1`
`offset`	`Optional[int]`	Skip the first N documents within each group. Useful for pagination within groups.	`None`
`sort`	`Optional[str]`	How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.	`None`
`format`	`str`	Response structure format. 'grouped' (nested, default) or 'simple' (flat list).	`'grouped'`
`main`	`Optional[bool]`	If True, returns first field grouping as main result list, flattening the response.	`None`
`ngroups`	`bool`	If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.	`False`
`truncate`	`bool`	If True, base facet counts on one doc per group only.	`False`
`facet`	`bool`	If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.	`False`
`cache_percent`	`int`	Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.	`0`

Returns:

Type	Description
`Self`	A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group(
...     query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"],
...     limit=5
... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

DisMaxQueryParser

Bases: BaseQueryParser

DisMax (Disjunction Max) Query Parser for Apache Solr.

The DisMax query parser is designed for user-friendly queries, providing an experience similar to popular search engines like Google. It handles queries gracefully even when they contain errors, making it ideal for end-user facing applications. DisMax distributes terms across multiple fields with individual boosts and combines results using disjunction max scoring.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html

Key Features

Simplified query syntax (no need for field names)

Error-tolerant parsing

Multi-field search with individual field boosts (qf)

Phrase boosting for proximity matches (pf)

Minimum should match logic (mm)

Tie-breaker for scoring across fields

Boost queries and functions for result tuning

How DisMax Scoring Works

The "tie" parameter controls how field scores are combined: - tie=0.0 (default): Only the highest scoring field contributes - tie=1.0: All field scores are summed - tie=0.1 (typical): Highest score + 0.1 * sum of other scores

Examples:

>>> # Basic multi-field search >>> parser = DisMaxQueryParser( ... query="ipod", ... query_fields={"name": 2.0, "features": 1.0, "text": 0.5} ... )

>>> # With phrase boosting and minimum match >>> parser = DisMaxQueryParser( ... query="belkin ipod", ... query_fields={"name": 5.0, "text": 2.0}, ... phrase_fields={"name": 10.0, "text": 3.0}, ... phrase_slop=2, ... min_match="75%" ... )

>>> # With boost queries >>> parser = DisMaxQueryParser( ... query="video", ... query_fields={"features": 20.0, "text": 0.3}, ... boost_queries="cat:electronics^5.0" ... )

Parameters:

Name Type Description Default

query

Main query string (user's search terms)

required

alternate_query

Fallback query if q is not specified

required

query_fields

Fields to search with boosts, e.g., {'title': 2.0, 'body': 1.0}

required

query_slop

Phrase slop for explicit phrase queries in user input

required

min_match

Minimum should match specification (e.g., '75%', '2<-25% 9<-3')

required

phrase_fields

Fields for phrase boosting with boosts

required

phrase_slop

Maximum position distance for phrase queries

required

tie_breaker

Tie-breaker value (0.0 to 1.0) for multi-field scoring

required

boost_queries

Additional queries to boost matching documents (additive)

required

boost_functons

Function queries to boost scores (additive)

required

Note

For multiplicative boosting (more predictable), use ExtendedDisMaxQueryParser with the boost parameter instead of bq/bf.

See Also

ExtendedDisMaxQueryParser: Enhanced version with additional features

StandardParser: For more precise Lucene syntax queries

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

ExtendedDisMaxQueryParser

Bases: DisMaxQueryParser

Extended DisMax (eDisMax) Query Parser for Apache Solr.

The Extended DisMax (eDisMax) query parser is an improved version of DisMax that handles full Lucene query syntax while maintaining error tolerance. It's the most flexible parser for user-facing search applications, combining the precision of the Standard parser with the user-friendliness of DisMax.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html

Key Enhancements over DisMax

Full Lucene query syntax support (field names, boolean operators, wildcards)

Automatic mm (minimum match) adjustment when stopwords are removed

Lowercase operator support ("and", "or" as operators)

Bigram (pf2) and trigram (pf3) phrase boosting

Multiplicative boosting via boost parameter

Fine-grained control over stopword handling

User field restrictions (uf) for security

Advanced Phrase Boosting

pf: Standard phrase fields (all query terms)

pf2: Bigram phrase fields (word pairs)

pf3: Trigram phrase fields (word triplets) Each has independent slop control (ps, ps2, ps3)

Examples:

>>> # Basic eDisMax query with Lucene syntax >>> parser = ExtendedDisMaxQueryParser( ... query="title:solr OR (content:search AND type:guide)", ... query_fields={"title": 2.0, "content": 1.0} ... )

>>> # Multi-level phrase boosting >>> parser = ExtendedDisMaxQueryParser( ... query="apache solr search", ... query_fields={"title": 5.0, "body": 1.0}, ... phrase_fields={"title": 50.0}, # All 3 words ... phrase_fields_bigram={"title": 20.0}, # 2-word phrases ... phrase_fields_trigram={"title": 30.0}, # 3-word phrases ... phrase_slop=2, ... phrase_slop_bigram=1 ... )

>>> # With field aliasing and restrictions >>> parser = ExtendedDisMaxQueryParser( ... query="name:Mike sysadmin", ... query_fields={"title": 1.0, "text": 1.0}, ... user_fields=["title", "text", "last_name", "first_name"] ... )

>>> # With automatic mm relaxation >>> parser = ExtendedDisMaxQueryParser( ... query="the quick brown fox", ... query_fields={"content": 1.0, "title": 2.0}, ... min_match="75%", ... min_match_auto_relax=True # Adjust if stopwords removed ... )

Parameters:

Name Type Description Default

split_on_whitespace

If True, analyze each term separately; if False (default), analyze term sequences for multi-word synonyms

required

min_match_auto_relax

Auto-relax mm when stopwords are removed unevenly

required

lowercase_operators

Treat lowercase 'and'/'or' as boolean operators

required

phrase_fields_bigram

Fields for bigram phrase boosting (pf2)

required

phrase_slop_bigram

Slop for bigram phrases (ps2)

required

phrase_fields_trigram

Fields for trigram phrase boosting (pf3)

required

phrase_slop_trigram

Slop for trigram phrases (ps3)

required

stopwords

If False, ignore StopFilterFactory in query analyzer

required

user_fields

Whitelist of fields users can explicitly query (uf parameter)

required

Inherits from DisMaxQueryParser

query, alternate_query, query_fields, min_match, phrase_fields, phrase_slop, tie_breaker, boost_queries, boost_functons

Note

The eDisMax default mm behavior differs from DisMax: - mm=0% if query contains explicit operators (-, +, OR, NOT) or q.op=OR - mm=100% if q.op=AND and query only uses AND operators

See Also

DisMaxQueryParser: Simpler version without Lucene syntax support

StandardParser: For pure Lucene syntax without DisMax features

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

GeoFilterQueryParser

Bases: SpatialQueryParser

Geospatial Filter Query Parsers (geofilt and bbox) for Apache Solr.

The geofilt and bbox parsers enable location-based filtering in Solr, allowing you to find documents within a certain distance from a point. Geofilt uses a circular radius (precise), while bbox uses a rectangular bounding box (faster but less precise).

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html

Key Features

Circular radius search (geofilt) or bounding box search (bbox)

Distance-based filtering and scoring

Configurable distance units (kilometers, miles, degrees)

Cache control for performance optimization

Compatible with LatLonPointSpatialField and RPT fields

How Geospatial Filtering Works

geofilt: - Creates a circular search area around a center point - Precise: Only includes documents within exact radius - Slightly slower but more accurate

bbox: - Creates a rectangular bounding box around a center point - Faster: Uses simpler rectangular calculations - May include points outside the circular radius

Distance Scoring

Use the score parameter to return distance as the relevance score: - none: Fixed score of 1.0 (default) - kilometers: Distance in km - miles: Distance in miles - degrees: Distance in degrees

Schema Requirements

Spatial field must be indexed with appropriate field type:

Examples:

>>> # Circular geospatial filter (precise) >>> parser = GeoFilterQueryParser( ... spatial_field="store_location", ... center_point=[45.15, -93.85], ... distance=5, # 5 km radius ... filter_type="geofilt" ... )

>>> # Bounding box filter (faster) >>> parser = GeoFilterQueryParser( ... spatial_field="restaurant_coords", ... center_point=[37.7749, -122.4194], # San Francisco ... distance=10, # 10 km ... filter_type="bbox" ... )

>>> # With distance scoring >>> parser = GeoFilterQueryParser( ... spatial_field="hotel_location", ... center_point=[51.5074, -0.1278], # London ... distance=2, ... filter_type="geofilt", ... score="kilometers" # Return distance as score ... )

>>> # Disable caching for dynamic queries >>> parser = GeoFilterQueryParser( ... spatial_field="user_location", ... center_point=[40.7128, -74.0060], # NYC ... distance=1, ... filter_type="geofilt", ... cache=False # Don't cache this filter ... )

>>> # Filter with sorting by distance >>> # Combine with geodist() function query for sorting >>> parser = GeoFilterQueryParser( ... spatial_field="store", ... center_point=[45.15, -93.85], ... distance=50 ... ) >>> # Add: &sort=geodist() asc to request

Parameters:

Name Type Description Default

filter_type

'geofilt' for circular (precise) or 'bbox' for bounding box (faster)

required

spatial_field

Name of the spatial indexed field (inherited from base, required)

required

center_point

[lat, lon] or [x, y] coordinates of search center (inherited, required)

required

distance

Radial distance from center point (inherited, required)

required

score

Scoring mode (none, kilometers, miles, degrees) (inherited from base)

required

cache

Whether to cache the filter query (inherited from base)

required

Returns:

Type Description

Filter query (fq) matching documents within the specified distance

Performance Tips

Use bbox for large radius searches where precision isn't critical

Set cache=false for highly variable queries (e.g., user location)

Use geofilt for small radius searches requiring precision

Consider using docValues for better spatial query performance

See Also

BBoxQueryParser: For querying indexed bounding boxes with spatial predicates

geodist() function: For distance calculations and sorting

Solr Spatial Search Guide: https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html

build(*args, **kwargs)

Build query parameters, excluding mixin keys.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

spatial_params()

Build the spatial search parameters string for use in filter queries.

KNNQueryParser

Bases: DenseVectorSearchQueryParser

K-Nearest Neighbors (KNN) Query Parser for Apache Solr Dense Vector Search.

The KNN query parser enables efficient similarity searches on dense vector fields using the k-nearest neighbors algorithm. It finds the topK documents whose vectors are most similar to the query vector according to the configured similarity function (cosine, dot product, or euclidean).

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Key Features

Efficient vector similarity search using HNSW algorithm

Configurable k (topK) for number of results

Pre-filtering support (explicit or implicit)

Re-ranking capability for hybrid search

Multiple similarity functions: cosine, dot_product, euclidean

How KNN Search Works

Query vector is compared against indexed vectors

HNSW (Hierarchical Navigable Small World) algorithm efficiently finds neighbors

Top k most similar vectors are returned

Similarity score is used for ranking

Pre-Filtering

Implicit: All fq filters (except post filters) automatically pre-filter when knn is main query

Explicit: Use preFilter parameter to specify filtering criteria

Tagged: Use includeTags/excludeTags to control which fq filters apply

Schema Requirements

Field must be DenseVectorField with matching vector dimension:

Examples:

>>> # Basic KNN search >>> parser = KNNQueryParser( ... vector_field="film_vector", ... vector=[0.1, 0.2, 0.3, 0.4, 0.5], ... top_k=10 ... )

>>> # With explicit pre-filtering >>> parser = KNNQueryParser( ... vector_field="product_vector", ... vector=[1.0, 2.0, 3.0, 4.0], ... top_k=20, ... pre_filter=["category:electronics", "inStock:true"] ... )

>>> # With tagged filtering >>> parser = KNNQueryParser( ... vector_field="doc_vector", ... vector=[0.5, 0.5, 0.5, 0.5], ... top_k=50, ... include_tags=["for_knn"] ... )

>>> # For re-ranking (use as rq parameter) >>> parser = KNNQueryParser( ... vector_field="content_vector", ... vector=[0.2, 0.3, 0.4, 0.5], ... top_k=100 # Searches whole index in re-ranking context ... )

Parameters:

Name Type Description Default

vector

Query vector as list of floats (required, must match field dimension)

required

top_k

Number of nearest neighbors to return (default: 10)

required

vector_field

Name of the DenseVectorField to search (inherited from base)

required

pre_filter

Explicit pre-filter query strings (inherited from base)

required

include_tags

Only use fq filters with these tags for implicit pre-filtering (inherited)

required

exclude_tags

Exclude fq filters with these tags from implicit pre-filtering (inherited)

required

Returns:

Type Description

Query results ranked by vector similarity score

Note

When used in re-ranking (rq parameter), topK refers to k-nearest neighbors in the whole index, not just the initial result set.

See Also

VectorSimilarityQueryParser: For threshold-based vector search

KNNTextToVectorQueryParser: For text-to-vector conversion with KNN search

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

KNNTextToVectorQueryParser

Bases: DenseVectorSearchQueryParser

KNN Text-to-Vector Query Parser for Apache Solr Dense Vector Search.

The knn_text_to_vector parser combines text encoding with k-nearest neighbors search, allowing you to search for similar documents using natural language queries instead of pre-computed vectors. It uses a language model to convert query text into a vector, then performs KNN search on that vector.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Key Features

Automatic text-to-vector encoding using language models

Eliminates need for pre-computing query vectors

Supports various embedding models (OpenAI, Hugging Face, etc.)

Combines semantic search with KNN efficiency

Configurable k (topK) for number of results

How Text-to-Vector KNN Works

Query text is sent to the configured language model

Model encodes text into a dense vector

KNN search is performed using the generated vector

Top k most similar documents are returned

Model Requirements

The model must be loaded into Solr's text-to-vector model store: - Configure model in schema via REST API - Supported: OpenAI, Hugging Face, Cohere, etc. - Model must produce vectors matching field dimension

Example Model Configuration (OpenAI): { "class": "dev.langchain4j.model.openai.OpenAiEmbeddingModel", "name": "openai-embeddings", "params": { "apiKey": "YOUR_API_KEY", "modelName": "text-embedding-ada-002" } }

Schema Requirements

Field must be DenseVectorField:

Examples:

>>> # Basic text-to-vector KNN search >>> parser = KNNTextToVectorQueryParser( ... vector_field="content_vector", ... text="machine learning algorithms", ... model="openai-embeddings", ... top_k=10 ... )

>>> # Semantic search with pre-filtering >>> parser = KNNTextToVectorQueryParser( ... vector_field="article_embedding", ... text="neural networks and deep learning", ... model="huggingface-embedder", ... top_k=20, ... pre_filter=["category:AI", "published:[2020 TO *]"] ... )

>>> # Multi-lingual semantic search >>> parser = KNNTextToVectorQueryParser( ... vector_field="multilingual_vector", ... text="apprentissage automatique", # French ... model="multilingual-embedder", ... top_k=15 ... )

>>> # With tagged filtering >>> parser = KNNTextToVectorQueryParser( ... vector_field="doc_vector", ... text="search query optimization", ... model="sentence-transformer", ... top_k=50, ... include_tags=["semantic_search"] ... )

Parameters:

Name Type Description Default

text

Natural language query text to encode (required)

required

model

Name of the model in text-to-vector model store (required)

required

top_k

Number of nearest neighbors to return (default: 10)

required

vector_field

Name of the DenseVectorField to search (inherited from base)

required

pre_filter

Explicit pre-filter query strings (inherited from base)

required

include_tags

Only use fq filters with these tags for implicit pre-filtering (inherited)

required

exclude_tags

Exclude fq filters with these tags from implicit pre-filtering (inherited)

required

Returns:

Type Description

Query results ranked by semantic similarity to the input text

Note

The model name must reference an existing model loaded into the /schema/text-to-vector-model-store endpoint.

See Also

KNNQueryParser: For search with pre-computed vectors

VectorSimilarityQueryParser: For threshold-based vector search

Solr Text-to-Vector Models Guide: https://solr.apache.org/guide/solr/latest/query-guide/text-to-vector.html

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

StandardParser

Bases: BaseQueryParser

Standard Query Parser (Lucene syntax) for Apache Solr.

The Standard Query Parser is Solr's default query parser, supporting full Lucene query syntax including field-specific searches, boolean operators, wildcards, proximity searches, range queries, boosting, and fuzzy searches. It offers greater precision but requires more exact syntax.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html

Features

Field-specific queries: title:"The Right Way" AND text:go

Boolean operators: AND, OR, NOT, +, -

Wildcards: te?t, test, esting

Proximity searches: "jakarta apache"~10

Range queries: [1 TO 5], {A TO Z}

Boosting: jakarta^4 apache

Fuzzy searches: roam~0.8

Grouping with parentheses: (jakarta OR apache) AND website

Constant score queries: description:blue^=1.0

Examples:

>>> # Basic field search >>> parser = StandardParser(query="title:Solr AND content:search")

>>> # Range query >>> parser = StandardParser(query="price:[10 TO 100]")

>>> # Proximity search >>> parser = StandardParser(query='"apache solr"~5')

>>> # With default field and operator >>> parser = StandardParser( ... query="apache solr", ... default_field="content", ... query_operator="AND" ... )

Parameters:

Name Type Description Default

query

Query string using Lucene syntax (required)

required

query_operator

Default operator ("AND" or "OR"). Determines how multiple terms are combined

required

default_field

Default field to search when no field is specified

required

split_on_whitespace

If True, analyze each term separately; if False (default), analyze term sequences together for multi-word synonyms and shingles

required

See Also

DisMaxQueryParser: For user-friendly queries with error tolerance

ExtendedDisMaxQueryParser: For advanced user queries combining Lucene syntax with DisMax features

build(*args, **kwargs)

Serialize the parser configuration to Solr-compatible query parameters using Pydantic's model_dump.

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

TermsQueryParser

Bases: BaseQueryParser

Terms Query Parser for Apache Solr.

The Terms Query Parser generates a query from multiple comma-separated values, matching documents where the specified field contains any of the provided terms. It's optimized for efficiently searching for multiple discrete values in a field, particularly useful for filtering by IDs, tags, or categories.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#terms-query-parser

Key Features

Efficient multi-value term matching

Configurable separator for term parsing

Multiple query implementation methods with different performance characteristics

Optimized for large numbers of terms

Works with both regular and docValues fields

Query Implementation Methods

termsFilter (default): Uses BooleanQuery or TermInSetQuery based on term count. Scales well with index size and moderately with number of terms.

booleanQuery: Creates a BooleanQuery. Scales well with index size but poorly with many terms.

automaton: Uses automaton-based matching. Good for certain use cases.

docValuesTermsFilter: For docValues fields. Automatically chooses between per-segment or top-level implementation.

docValuesTermsFilterPerSegment: Per-segment docValues filtering.

docValuesTermsFilterTopLevel: Top-level docValues filtering.

Performance Considerations

Use termsFilter (default) for general cases

Use booleanQuery for small term sets with large indices

Use docValues methods only on fields with docValues enabled

Term count affects which internal implementation is chosen

Examples:

>>> # Basic usage - search for multiple tags (as filter) >>> parser = TermsQueryParser( ... field="tags", ... terms=["software", "apache", "solr", "lucene"] ... )

>>> # With custom query field >>> parser = TermsQueryParser( ... field="tags", ... terms=["python", "java", "rust"], ... query="status:active" ... )

>>> # Using space separator with category IDs >>> parser = TermsQueryParser( ... field="categoryId", ... terms=["8", "6", "7", "5309"], ... separator=" ", ... method="booleanQuery" ... )

>>> # Filtering by product IDs >>> parser = TermsQueryParser( ... field="product_id", ... terms=["P123", "P456", "P789", "P012"] ... )

>>> # Using with docValues field >>> parser = TermsQueryParser( ... field="author_id", ... terms=["author1", "author2", "author3"], ... method="docValuesTermsFilter" ... )

>>> # Building query params for use with any Solr client >>> params = parser.build() >>> # {'q': '*:*', 'fq': '{!terms f=tags}software,apache,solr,lucene'}

Parameters:

Name Type Description Default

field

The field name to search (required)

required

terms

List of terms to match (required)

required

query

Optional main query string (default: ':'). The terms filter is applied as fq.

required

separator

Character(s) to use for joining terms (default: ','). Use ' ' (single space) if you want space-separated terms.

required

method

Query implementation method. Options: - termsFilter (default): Automatic choice between implementations - booleanQuery: Boolean query approach - automaton: Automaton-based matching - docValuesTermsFilter: Auto-select docValues approach - docValuesTermsFilterPerSegment: Per-segment docValues - docValuesTermsFilterTopLevel: Top-level docValues

required

Returns:

Type Description

Documents where the specified field contains any of the provided terms

Note

When using docValues methods, ensure the target field has docValues enabled in the schema. The cache parameter defaults to false for docValues methods.

See Also

StandardParser: For Lucene syntax queries with field specifications

DisMaxQueryParser: For multi-field user-friendly queries

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

VectorSimilarityQueryParser

Bases: DenseVectorSearchQueryParser

Vector Similarity Query Parser for Apache Solr Dense Vector Search.

The vectorSimilarity parser matches documents whose vector similarity to the query vector exceeds a minimum threshold. Unlike KNN which returns a fixed number of top results, this parser returns all documents meeting the similarity criteria, making it suitable for threshold-based retrieval.

Solr Reference

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Key Features

Threshold-based vector matching (minReturn)

Graph traversal control (minTraverse)

Pre-filtering support (explicit or implicit)

Returns all documents above similarity threshold

Useful for minimum quality requirements

How Vector Similarity Works

Query vector is compared against indexed vectors

Documents with similarity >= minReturn are returned

Graph traversal continues for nodes with similarity >= minTraverse

Results are ranked by similarity score

Similarity vs KNN

KNN: Returns exactly k results (top k most similar)

VectorSimilarity: Returns all results above threshold (0 to unlimited)

KNN: Best for "find similar items"

VectorSimilarity: Best for "find items similar enough"

Schema Requirements

Field must be DenseVectorField with matching vector dimension:

Examples:

>>> # Basic similarity search with threshold >>> parser = VectorSimilarityQueryParser( ... vector_field="product_vector", ... vector=[1.0, 2.0, 3.0, 4.0], ... min_return=0.7 # Only return docs with similarity >= 0.7 ... )

>>> # With traversal control >>> parser = VectorSimilarityQueryParser( ... vector_field="doc_vector", ... vector=[0.5, 0.5, 0.5, 0.5], ... min_return=0.8, # Return threshold ... min_traverse=0.6 # Continue graph traversal threshold ... )

>>> # With explicit pre-filtering >>> parser = VectorSimilarityQueryParser( ... vector_field="content_vector", ... vector=[0.2, 0.3, 0.4, 0.5], ... min_return=0.75, ... pre_filter=["inStock:true", "price:[* TO 100]"] ... )

>>> # As filter query for hybrid search >>> # Use with q=*:* to get all docs above similarity threshold >>> parser = VectorSimilarityQueryParser( ... vector_field="embedding", ... vector=[1.5, 2.5, 3.5, 4.5], ... min_return=0.85 ... )

Parameters:

Name Type Description Default

vector

Query vector as list of floats (required, must match field dimension)

required

min_return

Minimum similarity threshold for returned documents (required)

required

min_traverse

Minimum similarity to continue graph traversal (default: -Infinity)

required

vector_field

Name of the DenseVectorField to search (inherited from base)

required

pre_filter

Explicit pre-filter query strings (inherited from base)

required

include_tags

Only use fq filters with these tags for implicit pre-filtering (inherited)

required

exclude_tags

Exclude fq filters with these tags from implicit pre-filtering (inherited)

required

Returns:

Type Description

All documents with vector similarity >= minReturn, ranked by similarity score

Note

Setting minTraverse lower than minReturn allows exploring more of the graph to find potential matches, at the cost of more computation.

See Also

KNNQueryParser: For top-k nearest neighbor retrieval

KNNTextToVectorQueryParser: For text-based vector similarity search

facet(*, queries=None, fields=None, prefix=None, contains=None, contains_ignore_case=None, matches=None, sort=None, limit=None, offset=None, mincount=None, missing=None, method=None, enum_cache_min_df=None, exists=None, exclude_terms=None, overrequest_count=None, overrequest_ratio=None, threads=None, range_field=None, range_start=None, range_end=None, range_gap=None, range_hardend=None, range_include=None, range_other=None, range_method=None, pivot_fields=None, pivot_mincount=None)

Enable faceting to categorize and count search results.

Faceting breaks down search results into categories with counts, enabling drill-down navigation and data analysis.

Parameters:

Name Type Description Default

queries Optional[List[str]]

Arbitrary queries to generate facet counts for specific terms/expressions.

None

fields Optional[List[str]]

Fields to be treated as facets. Common for categories, brands, tags.

None

prefix Optional[str]

Limits facet terms to those starting with the given prefix.

None

contains Optional[str]

Limits facet terms to those containing the given substring.

None

contains_ignore_case Optional[bool]

If True, ignores case when matching the 'contains' parameter.

None

matches Optional[str]

Only returns facets matching this regular expression.

None

sort Optional[Literal['count', 'index']]

Ordering of facet field terms. Use 'count' (by frequency) or 'index' (alphabetically).

None

limit Optional[int]

Number of facet counts to return. Set to -1 for all.

None

offset Optional[int]

Offset into the facet list for paging.

None

mincount Optional[int]

Minimum count for facets to be included in response. Common to set to 1 to hide empty facets.

None

missing Optional[bool]

If True, include count of results with no facet value.

None

method Optional[Literal['enum', 'fc', 'fcs']]

Algorithm to use for faceting. Use 'enum' (enumerate all terms, good for low-cardinality), 'fc' (field cache, good for high-cardinality), or 'fcs' (per-segment, good for frequent updates).

None

enum_cache_min_df Optional[int]

Minimum document frequency for filterCache usage with enum method.

None

exists Optional[bool]

Cap facet counts by 1 (only for non-trie fields).

None

exclude_terms Optional[str]

Terms to remove from facet counts.

None

overrequest_count Optional[int]

Extra facets to request from each shard for better accuracy in distributed environments.

None

overrequest_ratio Optional[float]

Ratio for overrequesting facets from shards.

None

threads Optional[int]

Number of threads for parallel facet loading. Useful for multiple facets on large datasets.

None

range_field Optional[List[str]]

Fields for range faceting (e.g., price ranges, date ranges).

None

range_start Optional[Dict[str, str]]

Lower bound of ranges per field. Dict mapping field name to start value.

None

range_end Optional[Dict[str, str]]

Upper bound of ranges per field. Dict mapping field name to end value.

None

range_gap Optional[Dict[str, str]]

Size of each range span per field. E.g., {'price': '100'} for $100 increments.

None

range_hardend Optional[bool]

If True, uses exact range_end as upper bound even if it doesn't align with gap.

None

range_include Optional[List[Literal['lower', 'upper', 'edge', 'outer', 'all']]]

Range bounds to include in faceting. List of: 'lower' (include lower bound), 'upper' (include upper bound), 'edge' (first/last include edges), 'outer' (before/after inclusive), 'all'.

None

range_other Optional[List[Literal['before', 'after', 'between', 'none', 'all']]]

Additional range counts to compute. List of: 'before' (below first range), 'after' (above last range), 'between' (within bounds), 'none', or 'all'.

None

range_method Optional[Literal['filter', 'dv']]

Method to use for range faceting. Use 'filter' or 'dv' (for docValues).

None

pivot_fields Optional[List[str]]

Fields to use for pivot (hierarchical) faceting. E.g., ['category,brand'].

None

pivot_mincount Optional[int]

Minimum count for pivot facet inclusion.

None

Returns:

Type Description

Self

A new parser instance with facet configuration applied.

Examples:

Basic field faceting:

>>> parser.facet(fields=["genre", "director"], mincount=1, limit=10)

Range faceting for prices:

>>> parser.facet( ... range_field=["price"], ... range_start={"price": "0"}, ... range_end={"price": "1000"}, ... range_gap={"price": "100"} ... )

Filtered facets:

>>> parser.facet(fields=["color"], prefix="bl", mincount=5)

group(*, by=None, func=None, query=None, limit=1, offset=None, sort=None, format='grouped', main=None, ngroups=False, truncate=False, facet=False, cache_percent=0)

Enable result grouping to collapse results by common field values.

Result grouping (also known as field collapsing) combines documents that share a common field value. Useful for showing one result per author, category, or domain.

Parameters:

Name Type Description Default

by Optional[Union[str, List[str]]]

Field(s) to group results by. Shows one representative doc per unique field value. Must be single-valued and indexed. Can specify multiple fields for separate groupings.

None

func Optional[str]

Group by the result of a function query (e.g., 'floor(price)'). Not supported in SolrCloud/distributed searches.

None

query Optional[Union[str, List[str]]]

Create custom groups using arbitrary queries. Each query defines one group. Example: ['price:[0 TO 50]', 'price:[50 TO 100]'] creates price range groups.

None

limit int

Number of documents to return per group. Default: 1 (only top doc per group). Set higher to see more examples from each group.

1

offset Optional[int]

Skip the first N documents within each group. Useful for pagination within groups.

None

sort Optional[str]

How to sort documents within each group (e.g., 'date desc' for newest first). If not specified, uses the main sort parameter.

None

format str

Response structure format. 'grouped' (nested, default) or 'simple' (flat list).

'grouped'

main Optional[bool]

If True, returns first field grouping as main result list, flattening the response.

None

ngroups bool

If True, include total number of unique groups in response. Useful for pagination. Requires co-location in SolrCloud.

False

truncate bool

If True, base facet counts on one doc per group only.

False

facet bool

If True, enable grouped faceting. Can be expensive on large result sets. Requires co-location in SolrCloud.

False

cache_percent int

Enable result grouping cache (0-100). Set to 0 to disable (default). Try 50-100 for complex queries to improve performance.

0

Returns:

Type Description

Self

A new parser instance with group configuration applied.

Examples:

Group by author:

>>> parser.group(by="author", limit=3, sort="date desc", ngroups=True)

Group by price range:

>>> parser.group( ... query=["price:[0 TO 50]", "price:[50 TO 100]", "price:[100 TO *]"], ... limit=5 ... )

Multiple field groupings:

>>> parser.group(by=["author", "category"], limit=2)

highlight(*, method=None, fields=None, query=None, query_parser=None, require_field_match=None, query_field_pattern=None, use_phrase_highlighter=None, multiterm=None, snippets_per_field=None, fragment_size=None, encoder=None, max_analyzed_chars=None, tag_before=None, tag_after=None, offset_source=None, frag_align_ratio=None, fragsize_is_minimum=None, tag_ellipsis=None, default_summary=None, score_k1=None, score_b=None, score_pivot=None, bs_language=None, bs_country=None, bs_variant=None, bs_type=None, bs_separator=None, weight_matches=None, merge_contiguous=None, max_multivalued_to_examine=None, max_multivalued_to_match=None, alternate_field=None, max_alternate_field_length=None, alternate=None, formatter=None, simple_pre=None, simple_post=None, fragmenter=None, regex_slop=None, regex_pattern=None, regex_max_analyzed_chars=None, preserve_multi=None, payloads=None, frag_list_builder=None, fragments_builder=None, boundary_scanner=None, phrase_limit=None, multivalue_separator=None)

Enable highlighting to show snippets with query terms emphasized.

Highlighting shows snippets of text from documents with query terms emphasized (usually wrapped in tags). Common for search result snippets and previews.

Parameters:

Name Type Description Default

method Optional[Literal['unified', 'original', 'fastVector']]

Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).

None

fields Optional[Union[str, List[str]]]

Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).

None

query Optional[str]

Custom query to use for highlighting (overrides main query).

None

query_parser Optional[str]

Query parser for the highlight query (e.g., 'edismax', 'lucene').

None

require_field_match Optional[bool]

If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.

None

query_field_pattern Optional[str]

Regular expression pattern for fields to consider for highlighting.

None

use_phrase_highlighter Optional[bool]

If True, highlights complete phrases accurately. Default: True.

None

multiterm Optional[bool]

Enable highlighting for wildcard, fuzzy, and range queries. Default: True.

None

snippets_per_field Optional[int]

Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.

None

fragment_size Optional[int]

Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).

None

encoder Optional[Literal['', 'html']]

Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.

None

max_analyzed_chars Optional[int]

Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.

None

tag_before Optional[str]

Text/tag to insert before each highlighted term. Default: ''. Example: ''.

None

tag_after Optional[str]

Text/tag to insert after each highlighted term. Default: ''.

None

# Unified Highlighter specific most accurate, recommended

required

offset_source Optional[str]

How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.

None

frag_align_ratio Optional[float]

Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).

None

fragsize_is_minimum Optional[bool]

If True, treat fragment_size as minimum. Default: True.

None

tag_ellipsis Optional[str]

Text between multiple snippets (e.g., '...' or ' [...] ').

None

default_summary Optional[bool]

If True, return leading text when no matches found.

None

score_k1 Optional[float]

BM25 term frequency normalization. Default: 1.2.

None

score_b Optional[float]

BM25 length normalization. Default: 0.75.

None

score_pivot Optional[int]

BM25 average passage length in characters. Default: 87.

None

bs_language Optional[str]

BreakIterator language for text segmentation (e.g., 'en', 'ja').

None

bs_country Optional[str]

BreakIterator country code (e.g., 'US', 'GB').

None

bs_variant Optional[str]

BreakIterator variant for specialized locale rules.

None

bs_type Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]

How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.

None

bs_separator Optional[str]

Custom separator character when bs_type=SEPARATOR.

None

weight_matches Optional[bool]

Use Lucene's Weight Matches API for most accurate highlighting.

None

# Original Highlighter specific legacy

required

merge_contiguous Optional[bool]

Merge adjacent fragments.

None

max_multivalued_to_examine Optional[int]

Max entries to examine in multivalued field.

None

max_multivalued_to_match Optional[int]

Max matches in multivalued field.

None

alternate_field Optional[str]

Backup field for summary when no highlights found.

None

max_alternate_field_length Optional[int]

Max length of alternate field.

None

alternate Optional[bool]

Highlight alternate field.

None

formatter Optional[Literal['simple']]

Formatter for highlighted output. Use 'simple'.

None

simple_pre Optional[str]

Text before term (simple formatter).

None

simple_post Optional[str]

Text after term (simple formatter).

None

fragmenter Optional[Literal['gap', 'regex']]

Text snippet generator type. Use 'gap' or 'regex'.

None

regex_slop Optional[float]

Deviation factor for regex fragmenter.

None

regex_pattern Optional[str]

Pattern for regex fragmenter.

None

regex_max_analyzed_chars Optional[int]

Char limit for regex fragmenter.

None

preserve_multi Optional[bool]

Preserve order in multivalued fields.

None

payloads Optional[bool]

Include payloads in highlighting.

None

# FastVector Highlighter specific requires term vectors

required

frag_list_builder Optional[Literal['simple', 'weighted', 'single']]

Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.

None

fragments_builder Optional[Literal['default', 'colored']]

Fragment formatting implementation. Use 'default' or 'colored'.

None

boundary_scanner Optional[str]

Boundary scanner implementation.

None

phrase_limit Optional[int]

Max phrases to analyze for scoring.

None

multivalue_separator Optional[str]

Separator for multivalued fields.

None

Returns:

Type Description

Self

A new parser instance with highlight configuration applied.

Examples:

Basic highlighting:

>>> parser.highlight(fields=["title", "content"], snippets_per_field=3, fragment_size=150)

Custom HTML tags:

>>> parser.highlight( ... fields=["title"], ... tag_before='', ... tag_after='', ... encoder="html" ... )

Unified highlighter with sentence breaks:

>>> parser.highlight( ... fields=["content"], ... method="unified", ... bs_type="SENTENCE", ... fragment_size=200 ... )

more_like_this(*, fields=None, min_term_freq=None, min_doc_freq=None, max_doc_freq=None, max_doc_freq_pct=None, min_word_len=None, max_word_len=None, max_query_terms=None, max_num_tokens_parsed=None, boost=None, query_fields=None, interesting_terms=None, match_include=None, match_offset=None)

Enable MoreLikeThis to find documents similar to a given document.

MoreLikeThis finds documents similar to a given document by analyzing the terms that make it unique. Common for "related articles", recommendations, and content discovery.

Parameters:

Name Type Description Default

fields Optional[Union[str, List[str]]]

Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.

None

min_term_freq Optional[int]

Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.

None

min_doc_freq Optional[int]

Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.

None

max_doc_freq Optional[int]

Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.

None

max_doc_freq_pct Optional[int]

Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.

None

min_word_len Optional[int]

Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.

None

max_word_len Optional[int]

Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.

None

max_query_terms Optional[int]

Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.

None

max_num_tokens_parsed Optional[int]

Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.

None

boost Optional[bool]

If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.

None

query_fields Optional[str]

Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.

None

interesting_terms Optional[str]

Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.

None

match_include Optional[bool]

If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.

None

match_offset Optional[int]

When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.

None

Returns:

Type Description

Self

A new parser instance with more_like_this configuration applied.

Examples:

Basic similarity search:

>>> parser.more_like_this( ... fields=["title", "content"], ... min_term_freq=2, ... min_doc_freq=5, ... max_query_terms=25 ... )

Advanced with filtering:

>>> parser.more_like_this( ... fields=["content"], ... min_term_freq=1, ... min_doc_freq=3, ... min_word_len=4, ... max_doc_freq_pct=80, ... interesting_terms="details" ... )

Boosted fields:

>>> parser.more_like_this( ... fields=["title", "content"], ... query_fields="title^2.0 content^1.0", ... boost=True ... )

serialize_configs(params)

Serialize ParamsConfig objects as top level params.

Name	Type	Description	Default
`method`	`Optional[Literal['unified', 'original', 'fastVector']]`	Highlighting implementation to use. Use 'unified' (most accurate, recommended), 'original' (legacy, widely compatible), or 'fastVector' (fast for large docs, requires term vectors).	`None`
`fields`	`Optional[Union[str, List[str]]]`	Fields to generate highlighted snippets for. Must be stored fields. Example: ['title', 'content']. Use '*' for all fields (not recommended).	`None`
`query`	`Optional[str]`	Custom query to use for highlighting (overrides main query).	`None`
`query_parser`	`Optional[str]`	Query parser for the highlight query (e.g., 'edismax', 'lucene').	`None`
`require_field_match`	`Optional[bool]`	If True, only highlight terms in the field they matched. Set to False to highlight query terms in all requested fields.	`None`
`query_field_pattern`	`Optional[str]`	Regular expression pattern for fields to consider for highlighting.	`None`
`use_phrase_highlighter`	`Optional[bool]`	If True, highlights complete phrases accurately. Default: True.	`None`
`multiterm`	`Optional[bool]`	Enable highlighting for wildcard, fuzzy, and range queries. Default: True.	`None`
`snippets_per_field`	`Optional[int]`	Maximum number of snippets to return per field. Default: 1. Set higher to show more context passages.	`None`
`fragment_size`	`Optional[int]`	Size of highlighted snippets in characters. Default: 100. Set to 0 to highlight entire field (not recommended for large fields).	`None`
`encoder`	`Optional[Literal['', 'html']]`	Text encoder for snippets. Use '' (empty string, default) or 'html' to escape HTML and prevent XSS.	`None`
`max_analyzed_chars`	`Optional[int]`	Maximum characters to analyze per field. Default: 51200 (50KB). For large documents, limits analysis to improve performance.	`None`
`tag_before`	`Optional[str]`	Text/tag to insert before each highlighted term. Default: ''. Example: ''.	`None`
`tag_after`	`Optional[str]`	Text/tag to insert after each highlighted term. Default: ''.	`None`
`# Unified Highlighter specific`	`most accurate, recommended`		required
`offset_source`	`Optional[str]`	How offsets are obtained. Options: 'ANALYSIS', 'POSTINGS', 'POSTINGS_WITH_TERM_VECTORS', 'TERM_VECTORS'. Usually auto-detected.	`None`
`frag_align_ratio`	`Optional[float]`	Where to position first match in snippet (0.0-1.0). Default: 0.33 (left third, shows context before match).	`None`
`fragsize_is_minimum`	`Optional[bool]`	If True, treat fragment_size as minimum. Default: True.	`None`
`tag_ellipsis`	`Optional[str]`	Text between multiple snippets (e.g., '...' or ' [...] ').	`None`
`default_summary`	`Optional[bool]`	If True, return leading text when no matches found.	`None`
`score_k1`	`Optional[float]`	BM25 term frequency normalization. Default: 1.2.	`None`
`score_b`	`Optional[float]`	BM25 length normalization. Default: 0.75.	`None`
`score_pivot`	`Optional[int]`	BM25 average passage length in characters. Default: 87.	`None`
`bs_language`	`Optional[str]`	BreakIterator language for text segmentation (e.g., 'en', 'ja').	`None`
`bs_country`	`Optional[str]`	BreakIterator country code (e.g., 'US', 'GB').	`None`
`bs_variant`	`Optional[str]`	BreakIterator variant for specialized locale rules.	`None`
`bs_type`	`Optional[Literal['SEPARATOR', 'SENTENCE', 'WORD', 'CHARACTER', 'LINE', 'WHOLE']]`	How to segment text. Use 'SENTENCE' (default, recommended), 'WORD', 'CHARACTER', 'LINE', 'SEPARATOR', or 'WHOLE'.	`None`
`bs_separator`	`Optional[str]`	Custom separator character when bs_type=SEPARATOR.	`None`
`weight_matches`	`Optional[bool]`	Use Lucene's Weight Matches API for most accurate highlighting.	`None`
`# Original Highlighter specific`	`legacy`		required
`merge_contiguous`	`Optional[bool]`	Merge adjacent fragments.	`None`
`max_multivalued_to_examine`	`Optional[int]`	Max entries to examine in multivalued field.	`None`
`max_multivalued_to_match`	`Optional[int]`	Max matches in multivalued field.	`None`
`alternate_field`	`Optional[str]`	Backup field for summary when no highlights found.	`None`
`max_alternate_field_length`	`Optional[int]`	Max length of alternate field.	`None`
`alternate`	`Optional[bool]`	Highlight alternate field.	`None`
`formatter`	`Optional[Literal['simple']]`	Formatter for highlighted output. Use 'simple'.	`None`
`simple_pre`	`Optional[str]`	Text before term (simple formatter).	`None`
`simple_post`	`Optional[str]`	Text after term (simple formatter).	`None`
`fragmenter`	`Optional[Literal['gap', 'regex']]`	Text snippet generator type. Use 'gap' or 'regex'.	`None`
`regex_slop`	`Optional[float]`	Deviation factor for regex fragmenter.	`None`
`regex_pattern`	`Optional[str]`	Pattern for regex fragmenter.	`None`
`regex_max_analyzed_chars`	`Optional[int]`	Char limit for regex fragmenter.	`None`
`preserve_multi`	`Optional[bool]`	Preserve order in multivalued fields.	`None`
`payloads`	`Optional[bool]`	Include payloads in highlighting.	`None`
`# FastVector Highlighter specific`	`requires term vectors`		required
`frag_list_builder`	`Optional[Literal['simple', 'weighted', 'single']]`	Snippet fragmenting algorithm. Use 'simple', 'weighted', or 'single'.	`None`
`fragments_builder`	`Optional[Literal['default', 'colored']]`	Fragment formatting implementation. Use 'default' or 'colored'.	`None`
`boundary_scanner`	`Optional[str]`	Boundary scanner implementation.	`None`
`phrase_limit`	`Optional[int]`	Max phrases to analyze for scoring.	`None`
`multivalue_separator`	`Optional[str]`	Separator for multivalued fields.	`None`

Name	Type	Description	Default
`fields`	`Optional[Union[str, List[str]]]`	Fields to analyze for similarity. Use fields with meaningful content (title, description, body). Can be a single field or list of fields. For best performance, enable term vectors on these fields.	`None`
`min_term_freq`	`Optional[int]`	Minimum times a term must appear in the source document to be considered. Default: 2. Lower values include more terms but may add noise.	`None`
`min_doc_freq`	`Optional[int]`	Minimum number of documents a term must appear in across the index. Default: 5. Filters out very rare terms that aren't useful for finding similar documents.	`None`
`max_doc_freq`	`Optional[int]`	Maximum number of documents a term can appear in. Filters out very common terms (like 'the', 'and'). Use this OR max_doc_freq_pct, not both.	`None`
`max_doc_freq_pct`	`Optional[int]`	Maximum document frequency as percentage (0-100). Example: 75 means ignore terms appearing in more than 75% of documents. Use instead of max_doc_freq for relative filtering.	`None`
`min_word_len`	`Optional[int]`	Minimum word length in characters. Words shorter than this are ignored. Example: 4 to skip 'the', 'and', 'or'.	`None`
`max_word_len`	`Optional[int]`	Maximum word length in characters. Words longer than this are ignored. Useful to filter out long tokens or URLs.	`None`
`max_query_terms`	`Optional[int]`	Maximum number of interesting terms to use in the MLT query. Default: 25. Higher values = more comprehensive but slower. Start with default and adjust as needed.	`None`
`max_num_tokens_parsed`	`Optional[int]`	Maximum tokens to analyze per field (for fields without term vectors). Default: 5000. Set lower for better performance on large documents.	`None`
`boost`	`Optional[bool]`	If True, boost the query by each term's relevance/importance. Default: False. Enable for better relevance ranking of similar documents.	`None`
`query_fields`	`Optional[str]`	Query fields with optional boosts, like 'title^2.0 content^1.0'. Fields must also be in 'fields' parameter. Use to emphasize certain fields in similarity matching.	`None`
`interesting_terms`	`Optional[str]`	Controls what info about matched terms is returned. Options: 'none' (default), 'list' (term names), 'details' (terms with boost values). Use 'details' for debugging to see which terms were selected.	`None`
`match_include`	`Optional[bool]`	If True, includes the source document in results (useful to compare). Default varies: true for MLT handler, depends on configuration for component.	`None`
`match_offset`	`Optional[int]`	When using with a query, specifies which result doc to use for similarity. 0 = first result, 1 = second, etc. Default: 0.	`None`

Name	Description	Default
`query`	Main query string (user's search terms)	required
`alternate_query`	Fallback query if q is not specified	required
`query_fields`	Fields to search with boosts, e.g., {'title': 2.0, 'body': 1.0}	required
`query_slop`	Phrase slop for explicit phrase queries in user input	required
`min_match`	Minimum should match specification (e.g., '75%', '2<-25% 9<-3')	required
`phrase_fields`	Fields for phrase boosting with boosts	required
`phrase_slop`	Maximum position distance for phrase queries	required
`tie_breaker`	Tie-breaker value (0.0 to 1.0) for multi-field scoring	required
`boost_queries`	Additional queries to boost matching documents (additive)	required
`boost_functons`	Function queries to boost scores (additive)	required

Name	Description	Default
`split_on_whitespace`	If True, analyze each term separately; if False (default), analyze term sequences for multi-word synonyms	required
`min_match_auto_relax`	Auto-relax mm when stopwords are removed unevenly	required
`lowercase_operators`	Treat lowercase 'and'/'or' as boolean operators	required
`phrase_fields_bigram`	Fields for bigram phrase boosting (pf2)	required
`phrase_slop_bigram`	Slop for bigram phrases (ps2)	required
`phrase_fields_trigram`	Fields for trigram phrase boosting (pf3)	required
`phrase_slop_trigram`	Slop for trigram phrases (ps3)	required
`stopwords`	If False, ignore StopFilterFactory in query analyzer	required
`user_fields`	Whitelist of fields users can explicitly query (uf parameter)	required

Name	Description	Default
`filter_type`	'geofilt' for circular (precise) or 'bbox' for bounding box (faster)	required
`spatial_field`	Name of the spatial indexed field (inherited from base, required)	required
`center_point`	[lat, lon] or [x, y] coordinates of search center (inherited, required)	required
`distance`	Radial distance from center point (inherited, required)	required
`score`	Scoring mode (none, kilometers, miles, degrees) (inherited from base)	required
`cache`	Whether to cache the filter query (inherited from base)	required

Name	Description	Default
`vector`	Query vector as list of floats (required, must match field dimension)	required
`top_k`	Number of nearest neighbors to return (default: 10)	required
`vector_field`	Name of the DenseVectorField to search (inherited from base)	required
`pre_filter`	Explicit pre-filter query strings (inherited from base)	required
`include_tags`	Only use fq filters with these tags for implicit pre-filtering (inherited)	required
`exclude_tags`	Exclude fq filters with these tags from implicit pre-filtering (inherited)	required

Name	Description	Default
`text`	Natural language query text to encode (required)	required
`model`	Name of the model in text-to-vector model store (required)	required
`top_k`	Number of nearest neighbors to return (default: 10)	required
`vector_field`	Name of the DenseVectorField to search (inherited from base)	required
`pre_filter`	Explicit pre-filter query strings (inherited from base)	required
`include_tags`	Only use fq filters with these tags for implicit pre-filtering (inherited)	required
`exclude_tags`	Exclude fq filters with these tags from implicit pre-filtering (inherited)	required

Name	Description	Default
`query`	Query string using Lucene syntax (required)	required
`query_operator`	Default operator ("AND" or "OR"). Determines how multiple terms are combined	required
`default_field`	Default field to search when no field is specified	required
`split_on_whitespace`	If True, analyze each term separately; if False (default), analyze term sequences together for multi-word synonyms and shingles	required

Name	Description	Default
`field`	The field name to search (required)	required
`terms`	List of terms to match (required)	required
`query`	Optional main query string (default: ':'). The terms filter is applied as fq.	required
`separator`	Character(s) to use for joining terms (default: ','). Use ' ' (single space) if you want space-separated terms.	required
`method`	Query implementation method. Options: - termsFilter (default): Automatic choice between implementations - booleanQuery: Boolean query approach - automaton: Automaton-based matching - docValuesTermsFilter: Auto-select docValues approach - docValuesTermsFilterPerSegment: Per-segment docValues - docValuesTermsFilterTopLevel: Top-level docValues	required