Overview
Indexing is the process of adding documents to Solr for retrieval. Taiyo provides both synchronous and asynchronous clients for efficient document indexing.
Core Concepts
Documents
Documents are the primary unit of data in Solr. Each document contains fields with values that can be indexed, stored, and searched.
Collections
A collection is a logical index in Solr. Documents are indexed into collections, which can be distributed across multiple nodes for scalability.
Commits
Commits make indexed documents visible for search. Taiyo supports:
- Immediate commits (
commit=True) - Manual commits (
client.commit()) - Batch commits for performance
Indexing Approaches
Single Document
from taiyo import SolrClient, SolrDocument
with SolrClient("http://localhost:8983/solr") as client:
client.set_collection("my_collection")
doc = SolrDocument(title="Document Title", content="Document content")
client.add(doc, commit=True)
Batch Indexing
from taiyo import SolrClient
with SolrClient("http://localhost:8983/solr") as client:
client.set_collection("my_collection")
docs = [
SolrDocument(title=f"Document {i}", content=f"Content {i}")
for i in range(1000)
]
client.add(docs, commit=False)
client.commit()
import asyncio
from taiyo import AsyncSolrClient
async with AsyncSolrClient("http://localhost:8983/solr") as client:
client.set_collection("my_collection")
# Split into batches and process concurrently
batch_size = 100
all_docs = [SolrDocument(title=f"Doc {i}") for i in range(1000)]
batches = [all_docs[i:i + batch_size] for i in range(0, len(all_docs), batch_size)]
# Index all batches concurrently
await asyncio.gather(*[client.add(batch, commit=False) for batch in batches])
await client.commit()
Commit Strategy
Commits are expensive operations. For bulk indexing:
for batch in batches:
client.add(batch, commit=False)
client.commit()
For real-time updates:
client.add(doc, commit=True)
Async for Concurrency
Use async clients when indexing from multiple sources concurrently:
import asyncio
async def index_source(client, source):
docs = await fetch_from_source(source)
await client.add(docs, commit=False)
async with AsyncSolrClient(url) as client:
client.set_collection("my_collection")
await asyncio.gather(
index_source(client, "source1"),
index_source(client, "source2"),
index_source(client, "source3"),
)
await client.commit()
Error Handling
from taiyo import SolrError
try:
client.add(docs, commit=True)
except SolrError as e:
print(f"Indexing failed: {e}")
print(f"Status: {e.status_code}")
See Also
- Examples - Complete indexing examples with real datasets
- Schema Management - Define and manage your Solr schema
- Client Overview - Client configuration and usage