MoreLikeThis (MLT)
MoreLikeThis (MLT) surfaces documents that share significant terms with a seed document.
Solr Documentation: MoreLikeThis
Configuration Approaches
from taiyo.parsers import StandardParser
from taiyo.params import MoreLikeThisParamsConfig
parser = StandardParser(
query="id:article-123",
configs=[
MoreLikeThisParamsConfig(
fields=["title", "content"],
min_term_freq=1,
min_doc_freq=1,
max_query_terms=20,
boost=True,
match_include=False,
)
],
)
from taiyo.parsers import StandardParser
parser = StandardParser(query="id:article-123", rows=1).more_like_this(
fields=["title", "content"],
min_term_freq=1,
min_doc_freq=1,
max_query_terms=20,
boost=True,
match_include=False,
)
Basic Usage
parser = StandardParser(query=f"id:{target_id}", rows=1).more_like_this(
fields=["title", "content"],
min_term_freq=1,
min_doc_freq=1,
max_query_terms=20,
boost=True,
match_include=False,
)
response = client.search(parser, document_model=Article)
Key Parameters
MoreLikeThisParamsConfig(
fields=["title", "content"], # Fields to analyze for similarity
min_term_freq=1, # Minimum term frequency in the seed doc
min_doc_freq=1, # Minimum document frequency across the index
max_doc_freq_pct=80, # (Optional) Percentage threshold for common terms
max_query_terms=20, # Cap on interesting terms used for the query
min_word_len=3, # Ignore short tokens
max_num_tokens_parsed=5000, # Limit analysis for non term-vector fields
boost=True, # Boost by term relevance scores
query_fields="title^2.0 content", # Optional boosted query fields
interesting_terms="details", # Include term provenance in the response
match_include=False, # Exclude the seed document from results
)
Refer to the Apache Solr documentation for the full list of parameters and defaults.
Handling Results
MLT responses are mapped onto typed models in Taiyo to make downstream handling ergonomic:
SolrResponse.more_like_this: a mapping of source document IDs toSolrMoreLikeThisResultinstances.SolrMoreLikeThisResult.docs: parsedSolrDocumentinstances representing similar documents.SolrMoreLikeThisResult.interesting_terms: optional per-document metadata wheninteresting_termsis set tolistordetails.
Example:
from taiyo.types import SolrMoreLikeThisResult
mlt_map: dict[str, SolrMoreLikeThisResult[Article]] = response.more_like_this or {}
related = mlt_map.get(target_id)
if related:
print(f"Found {related.num_found} related docs")
for doc in related.docs:
print(doc.id, doc.title)
if related.interesting_terms:
print("Interesting terms:")
if isinstance(related.interesting_terms, dict):
for term, boost in related.interesting_terms.items():
print(f" {term}: {boost}")
else:
for term in related.interesting_terms:
print(f" {term}")
When working with interesting_terms="details", Solr returns a dictionary keyed by term with boost values. Taiyo surfaces those details in the SolrMoreLikeThisResult so you can inspect which tokens contributed to the similarity score.
Next Steps
- Learn about Faceting for aggregations
- Explore Grouping for result organization
- See Highlighting for search snippets