Incorrect mapping for 'java.lang.Short and java.lang.Byte' fields with Elasticsearch.

Description

Hi,

I recently came across the following case, some of my attributes are not being converted correctly, specific cases of type byte and short.

Maybe because lucene does not have these types, but in elasticsearch we have.

It would be nice as these fields, because are optimized compared to a text / keyword.

I created a test case to be documented here, and maybe in elasticsearch 6.x or before, we'll be able to use these types.

Example class:

@Entity @Indexed(index = "my-index") public class MyTestEntity { @Id @DocumentId private Long id; @Field(analyze = Analyze.YES) private String stringWithAnalyze; @Field(analyze = Analyze.NO) private String stringWithoutAnalyze; @Field(analyze = Analyze.YES) private Boolean booleanWrapWithAnalyze; @Field(analyze = Analyze.NO) private Boolean booleanWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private boolean booleanPrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private boolean booleanPrimitiveWithoutAnalyze; @Field(analyze = Analyze.YES) private Byte byteWrapWithAnalyze; @Field(analyze = Analyze.NO) private Byte byteWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private byte bytePrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private byte bytePrimitiveWithoutAnalyze; @Field(analyze = Analyze.YES) private Integer integerWrapWithAnalyze; @Field(analyze = Analyze.NO) private Integer integerWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private int integerPrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private int integerPrimitiveWithoutAnalyze; @Field(analyze = Analyze.YES) private Long longWrapWithAnalyze; @Field(analyze = Analyze.NO) private Long longWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private long longPrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private long longPrimitiveWithoutAnalyze; @Field(analyze = Analyze.YES) private Float floatWrapWithAnalyze; @Field(analyze = Analyze.NO) private Float floatWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private float floatPrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private float floatPrimitiveWithoutAnalyze; @Field(analyze = Analyze.YES) private Double doubleWrapWithAnalyze; @Field(analyze = Analyze.NO) private Double doubleWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private double doublePrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private double doublePrimitiveWithoutAnalyze; @Field(analyze = Analyze.YES) private Short shortWrapWithAnalyze; @Field(analyze = Analyze.NO) private Short shortWrapWithoutAnalyze; @Field(analyze = Analyze.YES) private short shortPrimitiveWithAnalyze; @Field(analyze = Analyze.NO) private short shortPrimitiveWithoutAnalyze; protected MyTestEntity() { } .......

The following mapping is created:

{ "my-index" : { "mappings" : { "org.hibernate.search.bugs.MyTestEntity" : { "dynamic" : "strict", "properties" : { "booleanPrimitiveWithAnalyze" : { "type" : "boolean" }, "booleanPrimitiveWithoutAnalyze" : { "type" : "boolean" }, "booleanWrapWithAnalyze" : { "type" : "boolean" }, "booleanWrapWithoutAnalyze" : { "type" : "boolean" }, "bytePrimitiveWithAnalyze" : { "type" : "text" }, "bytePrimitiveWithoutAnalyze" : { "type" : "keyword", "norms" : true }, "byteWrapWithAnalyze" : { "type" : "text" }, "byteWrapWithoutAnalyze" : { "type" : "keyword", "norms" : true }, "doublePrimitiveWithAnalyze" : { "type" : "double" }, "doublePrimitiveWithoutAnalyze" : { "type" : "double" }, "doubleWrapWithAnalyze" : { "type" : "double" }, "doubleWrapWithoutAnalyze" : { "type" : "double" }, "floatPrimitiveWithAnalyze" : { "type" : "float" }, "floatPrimitiveWithoutAnalyze" : { "type" : "float" }, "floatWrapWithAnalyze" : { "type" : "float" }, "floatWrapWithoutAnalyze" : { "type" : "float" }, "id" : { "type" : "keyword", "store" : true }, "integerPrimitiveWithAnalyze" : { "type" : "integer" }, "integerPrimitiveWithoutAnalyze" : { "type" : "integer" }, "integerWrapWithAnalyze" : { "type" : "integer" }, "integerWrapWithoutAnalyze" : { "type" : "integer" }, "longPrimitiveWithAnalyze" : { "type" : "long" }, "longPrimitiveWithoutAnalyze" : { "type" : "long" }, "longWrapWithAnalyze" : { "type" : "long" }, "longWrapWithoutAnalyze" : { "type" : "long" }, "shortPrimitiveWithAnalyze" : { "type" : "text" }, "shortPrimitiveWithoutAnalyze" : { "type" : "keyword", "norms" : true }, "shortWrapWithAnalyze" : { "type" : "text" }, "shortWrapWithoutAnalyze" : { "type" : "keyword", "norms" : true }, "stringWithAnalyze" : { "type" : "text" }, "stringWithoutAnalyze" : { "type" : "keyword", "norms" : true } } } } } }

Test Case: https://github.com/frekele/hibernate-search-elasticsearch-test-case/tree/HSEARCH-2908
Travis CI log: https://travis-ci.org/frekele/hibernate-search-elasticsearch-test-case

Attachments

1
  • 04 Oct 2017, 02:58 AM

Activity

Show:

Fabio Massimo Ercoli February 15, 2019 at 3:49 PM

Done. The pull request contains a test to probe all backend types.

Fabio Massimo Ercoli February 15, 2019 at 1:22 PM

Yes, @Yoann Rodière, It has been fixed by https://hibernate.atlassian.net/browse/HSEARCH-3424.

Where all numeric types:

  • java.lang.Byte

  • java.lang.Short

  • java.lang.Double

  • java.lang.Float
    are passed as they are to the backend API.

Elasticsearch backend supports all of them natively, we use all of them.
In Lucene we uses IntPoint for byte and short, DoublePoint for Double and FloatPoint for float.

We need to add some tests to probe the effective type used by the backend. Currently we probe only the behaviour.

Yoann Rodière February 5, 2019 at 8:59 AM

Will probably be fixed as part of HSEARCH-1779. Please just make sure we have proper tests.

I think we don't have any test that checks the exact mapping of a particular type currently; we check that predicates/sorts/projections work fine, but not that the mapping matches what we expect.
We should probably add something, but let's avoid spending too much time on it; I think we could just write one very simple test.

Let's talk about it before you start?

Sanne Grinovero October 6, 2017 at 9:45 AM

you mention this as an optimisation, but remember that in practice there isn't any significant benefit in mapping such short ranged numbers as numeric fields. That's precisely why Lucene doesn't support them.

Elasticsearch does support it so that the API is consistend with other numbers, and so will we, but it's just a matter of consistency to treat the different numeric types in the same way.

Yoann Rodière October 4, 2017 at 6:38 AM

Thanks ! That's right, the default mappings for byte and short are targeting text/keyword field because of Lucene. We can't reasonably change that in a minor, but we will definitely will do something in 6.

Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Affects versions

Priority

Created October 3, 2017 at 11:53 PM
Updated March 21, 2019 at 5:27 PM
Resolved February 20, 2019 at 10:32 AM