Use of BufferedWriter in GsonEntity may lead to MalformedInputException when input contains 4-byte unicode characters

Description

GsonEntity is designed to behave well with "reactive" I/O, and as such it must handle overflow (too much input when the consumer is not ready). It currently does so by storing "overflowing writes" in char buffers.
In order to avoid creating too many small buffers, we made sure to wrap the writer collecting input in a BufferedWriter, whose buffer size was set to 1024.
So far, so good.
But... It turns out that encoding arbitrarily-split chunks of characters does not work well. Specifically, when a unicode character is encoded on 4 bytes (i.e. two 16-bit char), and when the left and right char are not written to the same char buffer:

  • the left char at the end of the "left" buffer may be silently discarded by the encoder

  • the right char at the start of the "right" buffer may lead to a MalformedInputException, because the encoder has no state and does not remember the left char

I experienced the issue first-hand while playing on a demo, so don't tell me it's a rare and insignificant occurrence 😉

Test case and solution coming in a PR.

Activity

Show:
Fixed

Details

Assignee

Reporter

Affects versions

Priority

Created September 20, 2017 at 2:48 PM
Updated November 27, 2017 at 2:01 PM
Resolved November 27, 2017 at 2:01 PM