Support for the new Solr's character filters (Gustavo Fernandes)

Description

Solr 1.4 introduced CharacterFilters [1], which are based on Lucene's CharStream. Those filters are currently incompatible with the annotation @TokenFilterDef, which accept only TokenFilterFactories:

Onde ideia is to keep the same annotation, "generalize" the token filter factory type in the annotation, and on SolrAnalyzerBuilder construct a TokenizerChain which will accept both type of filters [2]

[1] http://lucene.apache.org/solr/api/org/apache/solr/analysis/CharFilterFactory.html
[2] http://lucene.apache.org/solr/api/org/apache/solr/analysis/TokenizerChain.html

Attachments

2

Activity

Show:

Gustavo Fernandes April 5, 2010 at 12:58 PM

Enriched patch, with documentation changes, correct styling, modified declaration order in the @AnalyzerDef

Sanne Grinovero April 1, 2010 at 11:37 PM

ah that makes a lot of sense

Gustavo Fernandes April 1, 2010 at 10:32 PM

CharFilters sit between the Reader and the Tokenizers [1], thus they are supposed to filter the stream produced by the reader before the tokenization.
For an illustration of how the CharFilters are used in Solr, please refer to [2]

[1] http://issues.apache.org/jira/browse/LUCENE-1466
[2] http://issues.apache.org/jira/browse/SOLR-822

The order of application would be first the charFilters in their declaration order, and then all the tokenFilters also in their own order. Probably the @AnalyzerDef is better represented this way:

Thoughts?

Sanne Grinovero April 1, 2010 at 10:08 AM

I assume there might a need to define the order in which TokenFilter(s) and CharFilter(s) are applied?
Maybe filters should be made of type Object, not nice for typesafety and self-documentation, so that it could contain both types.

Gustavo Fernandes April 1, 2010 at 1:29 AM

Attached is a patch to support Solr's CharStream. A new kind of filter factory was introduced to AnalyzerDef:

Being a new annotation defined as:

That will allow the usage of MappingCharFilters as requested by the users:

https://forum.hibernate.org/viewtopic.php?f=9&t=1002465

Fixed

Details

Assignee

Reporter

Components

Fix versions

Priority

Created March 29, 2010 at 12:13 AM
Updated April 2, 2016 at 8:05 PM
Resolved April 5, 2010 at 5:24 PM