Envers can't start when some audited field has accented letters

Description

This bug prevents some valid Java identifiers from being used as persisted fields names, and only when Envers is used. I marked it as a CRITICAL bug, because it may prevent someone from upgrading, or force them to rename lots of fields, specially for non-English users of Hibernate.

I am using Hibernate 5.2.0. This problem did NOT exist on 4.3.11.Final.

When I turn Envers on:

Then Hibernate bootstrap fails when the following line builds the metadata:

An example problem is an audited class (source code in UTF-8) containing a boolean field called "seÉfinal", since the character "É", an accented version of character "E", results in a parsing error: "Invalid byte 2 of 2-byte UTF-8 sequence".

Many character will fail, for example: áéíóúãõñàèçÁÉÃÇ etc.

Most commonly this is due to feeding ISO-8859-x (like Latin-1) but the XML parser thinking it is getting UTF-8 (or vice-versa). For example, certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8, and specifically such that based on first byte, second byte has unexpected high-order bits.

Maybe this is as simple as changing the used default character by something like String.getBytes() to String.getBytes("utf-8"). Please note that forms like String.getBytes() should be avoided, since they use the platform's default charset, which may result in code that works on some platforms and fails on others (they would even pass tests in some platforms that are, by chance, configured as expected by the code).

The following is an offending XML Document, containing name="seÉFinal", created internally, in memory, by Envers:

And this is the complete stacktrace:

I would also suggest that the bootstraping of Envers should issue better error messages. In this case it could have warned something along the lines of "Envers bootstraping failed when processing Users.class#seÉfinal. Caused by: javax.xml.stream.XMLStreamException: ParseError at row,col:67,50 Message: Invalid byte 2 of 2-byte UTF-8 sequence...".

Environment

None

Activity

Show:
Chris Cranford
June 10, 2016, 9:55 PM
Edited

Please provide a runnable test case. I was unable to reproduce this with the described steps on windows nor linux.

Marcelo Glasberg
June 14, 2016, 9:35 AM

Chris,

In any software using Hibernate with Envers, simply create a persisted field with this name: "aáçãéèíõÃÕÇñÑ", and then try to start it.

If you can't reproduce it, a test case will do no good. You probably have your platform's default charset setup such as it won't fail. See what I wrote above: "...the platform's default charset, ... may result in code that works on some platforms and fails on others..."

So you most likely need to change the default charset of your JVM to see the bug. See here:

http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding

Marcelo Glasberg
June 14, 2016, 9:38 AM

Some encodings you may try: windows-1252, UTF-16, ISO-8859-1

Chris Cranford
June 14, 2016, 12:51 PM

Thanks , I'll give those a try.

Assignee

Chris Cranford

Reporter

Marcelo Glasberg

Fix versions

backPortable

None

Suitable for new contributors

Yes, likely

Requires Release Note

Affirmative

Pull Request

None

backportDecision

None

Components

Affects versions

Priority

Critical
Configure