Sparse Query Parsers
Sparse query parsers are traditional keyword-based search parsers that work with text analysis and inverted indexes. Taiyo supports all three major Solr sparse parsers.
StandardParser (Lucene)
The StandardParser uses Solr's default Lucene query syntax, providing powerful and precise query capabilities.
Solr Documentation: Standard Query Parser
Basic Usage
from taiyo.parsers import StandardParser
parser = StandardParser(
query="python programming", query_operator="AND", default_field="content"
)
results = client.search(parser)
Parameters
parser = StandardParser(
query="search term", # Query string
query_operator="OR", # Default operator: OR or AND
default_field="content", # Default field when not specified
split_on_whitespace=True, # Split query on whitespace
# Common parameters
rows=10,
start=0,
field_list=["id", "title"],
sort="score desc",
filters=["status:active"],
)
Query Syntax Examples
# Simple term
parser = StandardParser(query="python")
# Field-specific search
parser = StandardParser(query="title:python")
# Boolean operators
parser = StandardParser(query="python AND programming")
parser = StandardParser(query="python OR java")
parser = StandardParser(query="python NOT perl")
# Phrase search
parser = StandardParser(query='"machine learning"')
# Proximity search (within 5 words)
parser = StandardParser(query='"apache solr"~5')
# Wildcard search
parser = StandardParser(query="prog*")
parser = StandardParser(query="te?t")
# Fuzzy search (edit distance)
parser = StandardParser(query="python~2")
# Range queries
parser = StandardParser(query="price:[100 TO 500]")
parser = StandardParser(query="date:[NOW-1YEAR TO NOW]")
# Boost terms
parser = StandardParser(query="python^2 java")
# Grouped queries
parser = StandardParser(query="(python OR java) AND programming")
# Field exists
parser = StandardParser(query="description:*")
# Field doesn't exist
parser = StandardParser(query="-description:*")
Example
from taiyo.parsers import StandardParser
parser = (
StandardParser(
query='(title:"machine learning" OR content:AI) AND category:tech',
query_operator="AND",
default_field="content",
rows=20,
field_list=["id", "title", "author", "published_date"],
sort="published_date desc",
filters=["status:published", "published_date:[NOW-1YEAR TO NOW]"],
)
.facet(field_list=["category", "author"], mincount=1)
.highlight(field_list=["title", "content"], fragment_size=150)
)
results = client.search(parser)
DisMaxQueryParser
The DisMaxQueryParser (DisjunctionMax) is designed for simple, user-entered queries. It searches across multiple fields and returns the best matching field score.
Solr Documentation: DisMax Query Parser
Basic Usage
from taiyo.parsers import DisMaxQueryParser
parser = DisMaxQueryParser(
query="python programming",
query_fields={"title": 3.0, "content": 1.0},
min_match="75%",
)
results = client.search(parser)
Parameters
parser = DisMaxQueryParser(
query="search term", # User query
query_fields={"title": 2.0, "content": 1.0}, # Fields to search with boosts
query_slop=0, # Phrase slop
phrase_fields={"title": 3.0}, # Phrase boosting
phrase_slop=0, # Phrase proximity slop
tie_breaker=0.1, # How to combine field scores
min_match="75%", # Minimum should match
boost_queries=["featured:true^5"], # Additional query boosts
boost_functons=["recip(ms(NOW,date),3.16e-11,1,1)"], # Function boosts
)
Query Fields
Weight different fields differently:
parser = DisMaxQueryParser(
query="python programming",
query_fields={
"title": 5.0, # Title matches are most important
"abstract": 3.0, # Abstract matches are important
"content": 1.0, # Body matches are standard
"tags": 2.0, # Tag matches are moderately important
},
)
Minimum Match
Control how many terms must match:
# At least 75% of query terms must match
parser = DisMaxQueryParser(
query="python programming language",
query_fields={"title": 2.0, "content": 1.0},
min_match="75%", # 3 terms → 2 must match
)
# Absolute number
parser = DisMaxQueryParser(
query="python programming language",
query_fields={"title": 2.0, "content": 1.0},
min_match="2", # At least 2 terms must match
)
# Complex conditional
parser = DisMaxQueryParser(
query="search terms here",
query_fields={"title": 2.0, "content": 1.0},
min_match="2<-25% 9<-3", # Complex rules
)
Phrase Boosting
Boost documents where terms appear together:
parser = DisMaxQueryParser(
query="machine learning",
query_fields={"title": 2.0, "content": 1.0},
phrase_fields={"title": 10.0}, # Boost exact phrase in title
phrase_slop=2, # Allow 2 words between terms
)
Tie Breaker
Control how scores from different fields combine:
# tie_breaker=0: Only use highest field score (default DisMax behavior)
# tie_breaker=1: Sum all field scores
# tie_breaker=0.1: Mostly use highest, slightly influenced by others
parser = DisMaxQueryParser(
query="python",
query_fields={"title": 2.0, "content": 1.0},
tie_breaker=0.1, # Slight influence from lower-scoring fields
)
Example
from taiyo.parsers import DisMaxQueryParser
parser = DisMaxQueryParser(
query="python web framework",
query_fields={"title": 5.0, "description": 3.0, "content": 1.0, "tags": 2.0},
phrase_fields={
"title": 10.0, # Phrase in title = big boost
"description": 5.0,
},
phrase_slop=2,
min_match="75%",
tie_breaker=0.1,
boost_queries=[
"featured:true^10", # Featured items boosted
"recent:true^5", # Recent items boosted
],
rows=20,
filters=["status:published"],
).facet(field_list=["category", "author"], mincount=1)
results = client.search(parser)
ExtendedDisMaxQueryParser
The ExtendedDisMaxQueryParser (eDisMax) extends DisMax with more advanced features while maintaining user-friendliness.
Solr Documentation: Extended DisMax Query Parser
Basic Usage
from taiyo.parsers import ExtendedDisMaxQueryParser
parser = ExtendedDisMaxQueryParser(
query="python programming",
query_fields={"title": 3.0, "content": 1.0},
min_match="75%",
)
results = client.search(parser)
All DisMax Features Plus...
eDisMax includes all DisMax parameters and adds:
parser = ExtendedDisMaxQueryParser(
# All DisMax parameters
query="python programming",
query_fields={"title": 3.0, "content": 1.0},
min_match="75%",
tie_breaker=0.1,
# Extended features
stop_words=True, # Remove stop words
lowercase_operators=True, # Allow 'and', 'or', 'not' (lowercase)
user_fields={"author": 2.0}, # Additional searchable fields
boost_params=["featured^10"], # Additional boost parameters
)
User Fields
Allow users to search additional fields:
parser = ExtendedDisMaxQueryParser(
query="python programming author:guido",
query_fields={"title": 3.0, "content": 1.0},
user_fields={"author": 2.0}, # Allow searching author field
)
Phrase Fields with Slop Variants
Different phrase proximity levels:
parser = ExtendedDisMaxQueryParser(
query="machine learning",
query_fields={"title": 2.0, "content": 1.0},
# Exact phrase boost
phrase_fields={"title": 10.0},
# Near-phrase boost (2-word slop)
phrase_slop_2_fields={"title": 5.0, "content": 2.0},
# Looser phrase boost (5-word slop)
phrase_slop_3_fields={"title": 3.0, "content": 1.5},
)
Boost Functions
Apply function-based boosts:
parser = ExtendedDisMaxQueryParser(
query="python",
query_fields={"title": 2.0, "content": 1.0},
boost_functons=[
"recip(ms(NOW,published_date),3.16e-11,1,1)", # Boost recent docs
"log(views)", # Boost by popularity
"product(rating,10)", # Boost by rating
],
)
Field Aliasing
Allow users to use aliases for fields:
parser = ExtendedDisMaxQueryParser(
query="t:python c:web", # t = title, c = content
field_aliases={"t": "title", "c": "content", "a": "author"},
)
Example
from taiyo.parsers import ExtendedDisMaxQueryParser
parser = (
ExtendedDisMaxQueryParser(
query="python web framework -php",
query_fields={"title": 5.0, "description": 3.0, "content": 1.0, "tags": 2.0},
phrase_fields={"title": 10.0, "description": 5.0},
phrase_slop=2,
user_fields={"author": 2.0, "category": 1.5},
min_match="75%",
tie_breaker=0.1,
boost_queries=["featured:true^20", "quality_score:[8 TO *]^10"],
boost_functons=[
"recip(ms(NOW,published_date),3.16e-11,1,1)", # Recency
"log(popularity)", # Popularity
],
stop_words=True,
lowercase_operators=True,
rows=20,
field_list=["id", "title", "author", "published_date", "score"],
sort="score desc, published_date desc",
filters=["status:published", "language:en"],
)
.facet(field_list=["category", "author", "year"], mincount=1, limit=20)
.group(by="author", limit=3, ngroups=True)
.highlight(
field_list=["title", "description", "content"],
fragment_size=150,
snippets_per_field=3,
simple_pre="<mark>",
simple_post="</mark>",
)
)
results = client.search(parser)
# Process results
print(f"Found {results.num_found} documents in {results.query_time}ms")
for doc in results.docs:
print(f"\n{doc.title} by {doc.author}")
# Show highlights if available
if results.highlighting and doc.id in results.highlighting:
for field, snippets in results.highlighting[doc.id].items():
print(f" {field}: {', '.join(snippets)}")
# Show facets
if results.facets:
print("\nCategories:")
category_facet = results.facets.fields.get("category")
if category_facet:
for bucket in category_facet.buckets:
print(f" {bucket.value}: {bucket.count}")
Comparison
| Feature | Standard | DisMax | eDisMax |
|---|---|---|---|
| User-friendly | No | Yes | Yes |
| Boolean operators | Yes | No | Yes (optional) |
| Field queries | Yes | No | Yes (with user_fields) |
| Multi-field search | Manual | Yes | Yes |
| Field boosting | Manual | Yes | Yes |
| Phrase boosting | Manual | Yes | Yes |
| Function boosting | Manual | Yes | Yes |
| Minimum match | Manual | Yes | Yes |
| Nested queries | Yes | Limited | Yes |
| Best for | Experts | Simple UI | Power users |
Best Practices
Choose the Right Parser
For complex boolean queries:
parser = StandardParser(
query='(title:"machine learning" OR tags:ml) AND category:tech NOT deprecated:true'
)
For simple search interfaces:
parser = DisMaxQueryParser(
query="machine learning", query_fields={"title": 3.0, "content": 1.0}
)
For flexible user queries:
parser = ExtendedDisMaxQueryParser(
query="machine learning -deprecated", query_fields={"title": 3.0, "content": 1.0}
)
Use Appropriate Field Weights
# Consider field importance
parser = DisMaxQueryParser(
query="search term",
query_fields={
"title": 10.0, # Most important
"headline": 5.0, # Very important
"abstract": 3.0, # Important
"content": 1.0, # Normal importance
"metadata": 0.5, # Less important
},
)
Tune Minimum Match
# Strict matching (all terms)
parser = DisMaxQueryParser(query="python web framework", min_match="100%")
# Balanced (most terms)
parser = DisMaxQueryParser(query="python web framework", min_match="75%")
# Loose (any terms)
parser = DisMaxQueryParser(query="python web framework", min_match="1")
Use Phrase Boosting Strategically
# Boost exact phrases significantly
parser = ExtendedDisMaxQueryParser(
query="machine learning",
query_fields={"title": 2.0, "content": 1.0},
phrase_fields={"title": 10.0, "content": 5.0}, # 5x-10x boost
phrase_slop=0, # Exact phrase only
)
Combine with Filters
# Use filter_query for constraints (cached)
parser = DisMaxQueryParser(
query="python",
query_fields={"title": 2.0, "content": 1.0},
filters=[
"status:active", # Cached
"published_date:[NOW-1YEAR TO NOW]", # Cached
"language:en", # Cached
],
)
Next Steps
- Learn about Dense Parsers for vector similarity search
- Explore Spatial Parsers for geographic queries
- Add Faceting for aggregations and filters
- Use Grouping for result organization
- Enable Highlighting for search snippets