Range faceting on multiple numeric values does not work

Description

I've got mapping within my indexed entity with looks like this:

Phrase.java

I was using faceting on tag Ids in version 5.0.1.Final. There is nothing indexed in Tag entity except for it's Id. After switching to version 5.3.0.Final I added @Facet(forField = "tags.id") to mapping. However I keep getting this HSEARCH000268 error saying that "Facet request 'tagsFacetRequest' tries to facet on field 'tags.id' which either does not exists or is not configured for faceting".

Environment

hibernate 4.3.10.Final,

Activity

Show:
Hardy Ferentschik
July 9, 2015, 8:16 AM

The error I was getting is described in HSEARCH-1929. I couldn't create an index.

Got you. I linked the two issues.

The workaround I had to implement seems kinda ugly to me: I introduced a transient method which is calling the super class getId method only for the purpose of indexing. getId method is usually introduced in some kind of super class of all entities so it's not cool to add search annotations there.

Hmm, I am not sure I am following.

Meanwhile in 5.2.0.Final which uses the same Lucene version but doesn't have a @Facet annotation my old code works.

Search 5.2 and 5.3 are using the same Lucene version, but the implementation of faceting has changed. 5.2 was based on some custom FieldCache approach whereas 5.3 is using some of Lucene's built-in support for faceting. Partly because FieldCache will be removed in upcoming versions of Lucene and partly because you get a real performance boost using the Lucene provided classes. See also this blog post

Hardy Ferentschik
July 9, 2015, 11:31 AM

Actually the problem is in DocumentBuilderIndexedEntity where the wrong doc values field is used (NumericDocValuesField instead of SortedNumericDocValuesField). Should be easy to fix.

Hardy Ferentschik
July 9, 2015, 12:06 PM

Hmm, SortedNumericDocValuesField gets us past the indexing, but then at search time org.apache.lucene.facet.range.LongRangeFacetCounts throws an exception. It looks like multivalued numeric fields are indeed a limitation with this approach. This will need some more investigation.

Guillaume Smet
February 23, 2016, 4:56 PM

Lucene ticket created here: https://issues.apache.org/jira/browse/LUCENE-7044 .

There's definitely something missing on the Lucene side.

Yoann Rodière
January 2, 2017, 6:10 PM

I dug a bit into the Elasticsearch implementation, and it seems they indeed use SortedNumericDocValuesField when indexing:

NumberFieldMapper.java, line 707, commit 27496d6b925d8900b3357ff5672cbafa5ef2b154

On the querying size (range aggretations), the implementation seems to be fully specific (they didn't use any Lucene feature):

RangeAggregator.java, line 242, commit 27496d6b925d8900b3357ff5672cbafa5ef2b154

SolR, on the other hand, seems to use SortedSetDocValues for numeric multi-valued fields, storing the numeric value as ByteRefs
I only checked the faceting query part, which looks like that:

"IntervalFacets.java, line 176, commit 93562da610bf8756351be7720c69872bc1cea727"

"IntervalFacets.java, line 246, commit 93562da610bf8756351be7720c69872bc1cea727"

See how multi-valued fields are handled by getCountString().
The full code is in IntervalFacets.java: it's a bit dense, but it really seems to be designed to work with numeric values.

This all tends to prove that Lucene itself is not capable of doing what we want to do without some additional work.
Judging by the amount of attention the Lucene ticket (https://issues.apache.org/jira/browse/LUCENE-7044) has received, I'd say we'll have to do it ourselves.

Assignee

Yoann Rodière

Reporter

Ashot Golovenko

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Affects versions

Priority

Critical
Configure