Schemas

Solr Schema Definition Models

`Analyzer`

Bases: BaseModel

Analyzer configuration for field types.

Defines how text is processed for indexing and querying.

Example

analyzer = Analyzer(
    tokenizer=Tokenizer(name="standard"),
    filters=[
        Filter(name="lowercase"),
        Filter(name="stop", ignore_case=True, words="stopwords.txt")
    ]
)

`build(format='xml', indent=' ')`

Build analyzer definition in specified format.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - "xml" (default) or "json"	`'xml'`
`indent`	`str`	Indentation prefix for XML output	`' '`

Returns:

Type	Description
`Dict[str, Any] \| str`	XML string for format="xml", dict for format="json"

`CharFilter`

Bases: BaseModel

Character filter for text analysis.

Applied before tokenization to preprocess text.

Example

from taiyo.schema.enums import SolrCharFilterFactory

# Using enum
char_filter = CharFilter(
    solr_class=SolrCharFilterFactory.PATTERN_REPLACE,
    pattern="([a-zA-Z])\\1+",
    replacement="$1$1"
)

# Or using string
char_filter = CharFilter(
    name="patternReplace",
    pattern="([a-zA-Z])\\1+",
    replacement="$1$1"
)

`build(format='xml')`

Build char filter definition.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - only "json" supported for components	`'xml'`

Returns:

Type	Description
`Dict[str, Any]`	dict representation

`validate_class(v)` `classmethod`

Accept both enum and string values.

`CopyField`

Bases: BaseModel

Solr copyField directive for automatic field data copying.

Copy fields instruct Solr to automatically copy data from a source field (or pattern) to a destination field during indexing. The destination field receives the original pre-analyzed text, which is then analyzed according to its own field type.

This enables: - Catch-all search fields aggregating multiple source fields - Same content analyzed differently for different purposes - Faceting on different field types than used for searching - Creating summary fields with character limits

Attributes:

Name	Type	Description
`source`	`str`	Source field name (supports wildcards like 'txt' or 'attr')
`dest`	`str`	Destination field name (must be a defined field, no wildcards)
`maxChars`	`str`	Optional character limit for copied content (truncates if exceeded)

Example

from taiyo.schema import CopyField

# Copy single field to catch-all
title_copy = CopyField(source="title", dest="text")
content_copy = CopyField(source="content", dest="text")

# Copy all text fields with wildcard
all_text_copy = CopyField(source="*_txt", dest="text")

# Copy with character limit for summaries
summary_copy = CopyField(
    source="content",
    dest="content_summary",
    maxChars=500
)

# Copy for different analysis
# (e.g., stemmed text to unstemmed for exact phrase matching)
exact_copy = CopyField(source="description", dest="description_exact")

# Copy multiple dynamic fields
multi_lang_copy = CopyField(source="title_*", dest="title_all")

Note

Copies happen before analysis of the destination field
Destination field must be defined (cannot be dynamic)
Wildcards only work in source, not destination
maxChars truncates at character boundary, may split words
Copying is one-way; changes to dest don't affect source

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/copy-fields.html

`build(format='xml', indent='')`

Build copy field definition in specified format.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - "xml" (default) or "json"	`'xml'`
`indent`	`str`	Indentation prefix for XML output	`''`

Returns:

Type	Description
`Dict[str, Any] \| str`	XML string for format="xml", dict for format="json"

`Filter`

Bases: BaseModel

Token filter for text analysis.

Applied after tokenization to transform tokens.

Example

from taiyo.schema.enums import SolrFilterFactory

# Using enum
filter = Filter(solr_class=SolrFilterFactory.LOWER_CASE)

# Using name
filter = Filter(name="lowercase")

# With parameters using enum
filter = Filter(
    solr_class=SolrFilterFactory.STOP,
    ignore_case=True,
    words="stopwords.txt"
)

# With parameters using name
filter = Filter(
    name="stop",
    ignore_case=True,
    words="stopwords.txt"
)

`build(format='xml')`

Build filter definition.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - only "json" supported for components	`'xml'`

Returns:

Type	Description
`Dict[str, Any]`	dict representation

`validate_class(v)` `classmethod`

Accept both enum and string values.

`Schema`

Bases: BaseModel

Complete Solr schema definition with all components.

A Schema represents the complete structure of a Solr collection, defining how documents are indexed, stored, and searched. It combines field definitions, field types, copy field directives, and configuration into a single model that can be serialized to XML or JSON.

The Schema supports both: - Classic schema.xml format (XML serialization) - Schema API format (JSON serialization)

Attributes:

Name	Type	Description
`name`	`Optional[str]`	Optional schema name identifier
`version`	`Optional[float]`	Schema version number (typically 1.6 for modern Solr)
`uniqueKey`	`Optional[str]`	Field name to use as unique document identifier (commonly 'id')
`fields`	`List[SolrField]`	List of field definitions
`dynamicFields`	`List[SolrDynamicField]`	List of dynamic field patterns
`fieldTypes`	`List[SolrFieldType]`	List of field type definitions
`copyFields`	`List[CopyField]`	List of copy field directives

Example

from taiyo.schema import (
    Schema, SolrField, SolrDynamicField,
    SolrFieldType, SolrFieldClass, CopyField
)
from taiyo.schema.field_type import Analyzer, Tokenizer, Filter

# Define field types
text_type = SolrFieldType(
    name="text_general",
    solr_class=SolrFieldClass.TEXT,
    position_increment_gap=100,
    analyzer=Analyzer(
        tokenizer=Tokenizer(name="standard"),
        filters=[
            Filter(name="lowercase"),
            Filter(name="stop", words="stopwords.txt")
        ]
    )
)

# Define fields
id_field = SolrField(
    name="id",
    type="string",
    indexed=True,
    stored=True,
    required=True
)

title_field = SolrField(
    name="title",
    type="text_general",
    indexed=True,
    stored=True
)

# Define dynamic fields
text_dynamic = SolrDynamicField(
    name="*_txt",
    type="text_general",
    indexed=True,
    stored=True
)

# Define copy fields
title_copy = CopyField(source="title", dest="text")

# Build complete schema
schema = Schema(
    name="my_collection",
    version=1.6,
    uniqueKey="id",
    fields=[id_field, title_field],
    dynamicFields=[text_dynamic],
    fieldTypes=[text_type],
    copyFields=[title_copy]
)

# Serialize to XML for schema.xml
xml_output = schema.build(format="xml")
with open("schema.xml", "w") as f:
    f.write(xml_output)

# Serialize to JSON for Schema API
json_output = schema.build(format="json")

# Use builder pattern
schema = (
    Schema(name="my_schema", version=1.6, uniqueKey="id")
    .add_field_type(text_type)
    .add_field(id_field)
    .add_field(title_field)
    .add_dynamic_field(text_dynamic)
    .add_copy_field(title_copy)
)

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/schema-elements.html https://solr.apache.org/guide/solr/latest/indexing-guide/schema-api.html

`add_copy_field(copy_field)`

Add a copy field to the schema (builder pattern).

`add_dynamic_field(field)`

Add a dynamic field to the schema (builder pattern).

`add_field(field)`

Add a field to the schema (builder pattern).

`add_field_type(field_type)`

Add a field type to the schema (builder pattern).

`build(format='xml')`

Build schema in specified format.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - "xml" (default) or "json"	`'xml'`

Returns:

Type	Description
`Dict[str, Any] \| str`	XML string for format="xml", dict for format="json"

`SolrCharFilterFactory`

Bases: str, Enum

Type-safe enum for Solr/Lucene char filter factory classes.

Char filters process the raw character stream before tokenization, performing operations like HTML stripping or character mapping. Each enum member maps to a char filter factory class name using the short notation (e.g., solr.HTMLStripCharFilterFactory).

Categories

HTML/Markup: HTML_STRIP
Pattern-based: PATTERN_REPLACE
Mapping: MAPPING
ICU: ICU_NORMALIZER2

Example

from taiyo.schema.enums import SolrCharFilterFactory

analyzer_config = {
    "charFilters": [
        {"class": SolrCharFilterFactory.HTML_STRIP},
        {"class": SolrCharFilterFactory.MAPPING, "mapping": "mapping-FoldToASCII.txt"}
    ],
    "tokenizer": {"class": "solr.StandardTokenizerFactory"}
}

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/charfilters.html

`SolrDynamicField`

Bases: SolrField

Dynamic field pattern for automatic field creation.

Dynamic fields use wildcard patterns (* prefix or suffix) to automatically configure fields that match the pattern. When a document contains a field name matching a dynamic field pattern, Solr automatically creates and configures that field using the dynamic field's settings.

Dynamic fields are particularly useful for: - Handling varying field names in semi-structured data - Supporting multi-language fields (e.g., title_en, title_fr) - Creating type-specific field groups (e.g., _txt, _i, *_dt)

Attributes:

Name	Type	Description
`name`	`str`	Dynamic field pattern (e.g., '_txt', 's', 'attr*')
`type`	`str`	Field type name (must reference a defined fieldType)

Example

from taiyo.schema import SolrDynamicField

# Text fields with suffix pattern
text_dynamic = SolrDynamicField(
    name="*_txt",
    type="text_general",
    indexed=True,
    stored=True,
    multi_valued=True
)

# String fields with suffix pattern
string_dynamic = SolrDynamicField(
    name="*_s",
    type="string",
    indexed=True,
    stored=True,
    doc_values=True
)

# Integer fields with suffix pattern
int_dynamic = SolrDynamicField(
    name="*_i",
    type="pint",
    indexed=True,
    stored=True,
    doc_values=True
)

# Date fields with suffix pattern
date_dynamic = SolrDynamicField(
    name="*_dt",
    type="pdate",
    indexed=True,
    stored=True,
    doc_values=True
)

# Attribute fields with prefix pattern
attr_dynamic = SolrDynamicField(
    name="attr_*",
    type="text_general",
    indexed=True,
    stored=True
)

# Ignored fields (no indexing or storage)
ignored_dynamic = SolrDynamicField(
    name="ignored_*",
    type="string",
    indexed=False,
    stored=False
)

Note

Patterns must contain exactly one asterisk (*)
Asterisk can be at beginning or end only
More specific patterns take precedence over less specific ones
If multiple patterns match, the longest match wins

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/dynamic-fields.html

`build(format='xml', indent='')`

Build field definition in specified format.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - "xml" (default) or "json"	`'xml'`
`indent`	`str`	Indentation prefix for XML output	`''`

Returns:

Type	Description
`Dict[str, Any] \| str`	XML string for format="xml", dict for format="json"

`SolrField`

Bases: BaseModel

Solr field definition specifying data indexing and storage behavior.

Fields are named data containers that reference a field type to determine how their values are analyzed, indexed, and stored. Each field can override default behaviors inherited from its field type.

Attributes:

Name	Type	Description
`name`	`str`	Field identifier (alphanumeric/underscore, no leading digit)
`type`	`str`	Field type name (must reference a defined fieldType)
`default`	`Optional[Any]`	Default value when not provided in documents
`indexed`	`Optional[bool]`	Enable field in queries and sorting (default: true)
`stored`	`Optional[bool]`	Store original value for retrieval (default: true)
`doc_values`	`Optional[bool]`	Column-oriented storage for sorting/faceting (default: true)
`multi_valued`	`Optional[bool]`	Allow multiple values per field (default: false)
`required`	`Optional[bool]`	Reject documents missing this field (default: false)
`omit_norms`	`Optional[bool]`	Disable length normalization (default: true for non-analyzed)
`omit_term_freq_and_positions`	`Optional[bool]`	Omit term frequency and positions
`omit_positions`	`Optional[bool]`	Omit positions but keep term frequency
`term_vectors`	`Optional[bool]`	Store term vectors for highlighting (default: false)
`term_positions`	`Optional[bool]`	Store term positions in vectors
`term_offsets`	`Optional[bool]`	Store term offsets in vectors
`term_payloads`	`Optional[bool]`	Store term payloads in vectors
`sort_missing_first`	`Optional[bool]`	Sort docs without this field first
`sort_missing_last`	`Optional[bool]`	Sort docs without this field last
`uninvertible`	`Optional[bool]`	Allow un-inverting when indexed=true docValues=false
`use_doc_values_as_stored`	`Optional[bool]`	Return docValues when using '*' in fl param
`large`	`Optional[bool]`	Lazy load values >512KB (requires stored=true, multiValued=false)

Example

from taiyo.schema import SolrField

# Unique ID field
id_field = SolrField(
    name="id",
    type="string",
    indexed=True,
    stored=True,
    required=True
)

# Text field for full-text search
title_field = SolrField(
    name="title",
    type="text_general",
    indexed=True,
    stored=True
)

# Multi-valued field
tags_field = SolrField(
    name="tags",
    type="string",
    indexed=True,
    stored=True,
    multi_valued=True
)

# Field with docValues for faceting
category_field = SolrField(
    name="category",
    type="string",
    indexed=True,
    stored=True,
    doc_values=True
)

# Numeric field with default value
price_field = SolrField(
    name="price",
    type="pdouble",
    indexed=True,
    stored=True,
    default=0.0
)

# Version field (internal)
version_field = SolrField(
    name="_version_",
    type="plong",
    indexed=False,
    stored=False
)

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/fields.html

`build(format='xml', indent='')`

Build field definition in specified format.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - "xml" (default) or "json"	`'xml'`
`indent`	`str`	Indentation prefix for XML output	`''`

Returns:

Type	Description
`Dict[str, Any] \| str`	XML string for format="xml", dict for format="json"

`SolrFieldClass`

Bases: str, Enum

Type-safe enum for Solr field type implementation classes.

Each enum member maps to a Solr field type class name using the short notation (e.g., solr.TextField). These can be used when defining SolrFieldType instances to ensure correct class names and enable IDE autocompletion.

Categories

Text/String: TEXT, STR, SORTABLE_TEXT, COLLATION, ICU_COLLATION, ENUM
Boolean/Binary: BOOL, BINARY, UUID
Numeric: INT_POINT, LONG_POINT, FLOAT_POINT, DOUBLE_POINT, DATE_POINT
Date/Currency: DATE_RANGE, CURRENCY
Spatial: LATLON_POINT_SPATIAL, BBOX, SPATIAL_RPT, RPT_WITH_GEOMETRY, POINT
Special: RANDOM_SORT, RANK, NEST_PATH, PRE_ANALYZED
Vector/ML: DENSE_VECTOR

Example

from taiyo.schema import SolrFieldType, SolrFieldClass

# Text field type
text_type = SolrFieldType(
    name="text_en",
    solr_class=SolrFieldClass.TEXT,
    analyzer=...
)

# Numeric field type
int_type = SolrFieldType(
    name="pint",
    solr_class=SolrFieldClass.INT_POINT
)

# Vector field type
vector_type = SolrFieldType(
    name="vector",
    solr_class=SolrFieldClass.DENSE_VECTOR,
    vectorDimension=768
)

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/field-types-included-with-solr.html

`SolrFieldType`

Bases: BaseModel

Represents a Solr field type definition.

Field types define how data is analyzed and stored.

Example

from taiyo.schema import SolrFieldType, Analyzer, Tokenizer, Filter
from taiyo.schema.enums import (
    SolrFieldClass,
    SolrTokenizerFactory,
    SolrFilterFactory
)

# Using enums (recommended)
field_type = SolrFieldType(
    name="text_general",
    solr_class=SolrFieldClass.TEXT,
    position_increment_gap=100,
    analyzer=Analyzer(
        tokenizer=Tokenizer(solr_class=SolrTokenizerFactory.STANDARD),
        filters=[
            Filter(solr_class=SolrFilterFactory.LOWER_CASE),
            Filter(solr_class=SolrFilterFactory.STOP, ignore_case=True, words="stopwords.txt")
        ]
    )
)

# Or using strings (also supported)
field_type = SolrFieldType(
    name="text_general",
    solr_class="solr.TextField",
    position_increment_gap=100,
    analyzer=Analyzer(
        tokenizer=Tokenizer(name="standard"),
        filters=[
            Filter(name="lowercase"),
            Filter(name="stop", ignore_case=True, words="stopwords.txt")
        ]
    )
)

Reference: https://solr.apache.org/guide/solr/latest/indexing-guide/field-type-definitions-and-properties.html

`build(format='xml', indent='')`

Build field type definition in specified format.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - "xml" (default) or "json"	`'xml'`
`indent`	`str`	Indentation prefix for XML output	`''`

Returns:

Type	Description
`Dict[str, Any] \| str`	XML string for format="xml", dict for format="json"

`validate_field_class(v)` `classmethod`

Accept both enum and string values.

`SolrFilterFactory`

Bases: str, Enum

Type-safe enum for Solr/Lucene filter factory classes.

Filters modify, remove, or add tokens in the token stream produced by tokenizers. Each enum member maps to a filter factory class name using the short notation (e.g., solr.LowerCaseFilterFactory).

Categories

Case: LOWER_CASE, UPPER_CASE, TURKISH_LOWER_CASE
Stemming: PORTER_STEM, SNOWBALL_PORTER, KS_STEM, various language stems
Stop words: STOP, SUGGEST_STOP
Synonyms: SYNONYM_GRAPH, SYNONYM (deprecated)
N-grams: EDGE_NGRAM, NGRAM, SHINGLE
Phonetic: BEIDER_MORSE, DAITCH_MOKOTOFF_SOUNDEX, DOUBLE_METAPHONE, METAPHONE, PHONEX, REFINED_SOUNDEX, SOUNDEX
Word analysis: WORD_DELIMITER_GRAPH, WORD_DELIMITER (deprecated)
Language-specific: Multiple for various languages
Special purpose: ASCII_FOLDING, TRUNCATE, REVERSE, TRIM, PROTECTED_TERM

Example

from taiyo.schema.enums import SolrFilterFactory

analyzer_config = {
    "tokenizer": {"class": "solr.StandardTokenizerFactory"},
    "filters": [
        {"class": SolrFilterFactory.LOWER_CASE},
        {"class": SolrFilterFactory.STOP, "words": "stopwords.txt"},
        {"class": SolrFilterFactory.PORTER_STEM}
    ]
}

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html

`SolrTokenizerFactory`

Bases: str, Enum

Type-safe enum for Solr/Lucene tokenizer factory classes.

Tokenizers break text into tokens (words) that can be further processed by filters. Each enum member maps to a tokenizer factory class name using the short notation (e.g., solr.StandardTokenizerFactory).

Categories

Standard: STANDARD, CLASSIC, WHITESPACE, KEYWORD
Pattern-based: PATTERN, SIMPLE_PATTERN, SIMPLE_PATTERN_SPLIT
Path/Email: PATH_HIERARCHY, UAX29_URL_EMAIL
Language-specific: THAI, KOREAN, JAPANESE, ICU
N-gram: EDGE_NGRAM, NGRAM
OpenNLP: OPENNLP

Example

from taiyo.schema import SolrFieldType, Tokenizer
from taiyo.schema.enums import SolrTokenizerFactory

field_type = SolrFieldType(
    name="text_standard",
    solr_class="solr.TextField",
    analyzer=Analyzer(
        tokenizer=Tokenizer(solr_tokenizer_class=SolrTokenizerFactory.STANDARD)
    )
)

Reference

https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html

`Tokenizer`

Bases: BaseModel

Tokenizer for text analysis.

Splits text into tokens.

Example

from taiyo.schema.enums import SolrTokenizerFactory

# Using enum
tokenizer = Tokenizer(solr_class=SolrTokenizerFactory.STANDARD)

# Or using name
tokenizer = Tokenizer(name="standard")

# Or using string class
tokenizer = Tokenizer(solr_class="solr.StandardTokenizerFactory")

`build(format='xml')`

Build tokenizer definition.

Parameters:

Name	Type	Description	Default
`format`	`str`	Output format - only "json" supported for components	`'xml'`

Returns:

Type	Description
`Dict[str, Any]`	dict representation

`validate_class(v)` `classmethod`

Accept both enum and string values.

Schemas

Analyzer

build(format='xml', indent=' ')

CharFilter

build(format='xml')

validate_class(v) classmethod

CopyField

build(format='xml', indent='')

Filter

build(format='xml')

validate_class(v) classmethod

Schema

add_copy_field(copy_field)

add_dynamic_field(field)

add_field(field)

add_field_type(field_type)

build(format='xml')

SolrCharFilterFactory

SolrDynamicField

build(format='xml', indent='')

SolrField

build(format='xml', indent='')

SolrFieldClass

SolrFieldType

build(format='xml', indent='')

validate_field_class(v) classmethod

SolrFilterFactory

SolrTokenizerFactory

Tokenizer

build(format='xml')

validate_class(v) classmethod

`Analyzer`

`build(format='xml', indent=' ')`

`CharFilter`

`build(format='xml')`

`validate_class(v)` `classmethod`

`CopyField`

`build(format='xml', indent='')`

`Filter`

`build(format='xml')`

`validate_class(v)` `classmethod`

`Schema`

`add_copy_field(copy_field)`

`add_dynamic_field(field)`

`add_field(field)`

`add_field_type(field_type)`

`build(format='xml')`

`SolrCharFilterFactory`

`SolrDynamicField`

`build(format='xml', indent='')`

`SolrField`

`build(format='xml', indent='')`

`SolrFieldClass`

`SolrFieldType`

`build(format='xml', indent='')`

`validate_field_class(v)` `classmethod`

`SolrFilterFactory`

`SolrTokenizerFactory`

`Tokenizer`

`build(format='xml')`

`validate_class(v)` `classmethod`