CollectionHelper.newHashMap(int) is counter productive

Description

In OGM, we usually use CollectionHelper.newHashMap(int initialCapacity) to avoid the resizing of the HashMap after its initialization.

The issue is that it's doing more harm than good because the default load factor is of 0.75 so the initial size of the map is always smaller than what we expect and it's always resized.

We should rename the parameter to expectedSize instead of initialCapacity and calculate the size taking into account the default load factor.

Typically, we use the following pattern to initialize a correctly sized Map:

Currently the Map is created with an initial capacity of myList.size() but as the load factor is of 0.75, when we try to put myList.size() elements in it, it is systematically resized. This is totally counter productive as the purpose was to avoid the resizing of the Map.

See for instance Lists.newArrayListWithExpectedSize of Guava.

Environment

None

Activity

Show:
Sanne Grinovero
May 27, 2016, 10:58 AM

The issue is not so much one of wasting space but that we allocate the map in a size that is "too small" to hold all the elements we want it to hold, so it gets resized right away instead of being allocated with the right size from the beginning.

Yes I had understood that; sorry I just wrote the wrong thing as I was getting ahead on myself on a related matter.

One can have "each code path take responsibility", but as shown it's very easy to allocate maps too small when e.g. driving the initialization from a list. So having something as Guava's newArrayListWithExpectedSize() would be nice.

The analogy with a List implementation isn't a good argument to do this on hashmaps as lists don't use buckets and don't aim to reuse buckets.

Whether it makes any difference in the larger scheme of things? Probably not. So I'd rather shave different yaks

Sounds reasonable. I'd rather not merge this either then: it adds some more math computation which is of questionable value. Merge it if you have proof with i.e. JMH, but then again this seems more like a potential optimisation to suggest to the OpenJDK team (if it's proven valid).

Sanne Grinovero
May 27, 2016, 11:00 AM

P.S. I don't mean to sound too negative. Feel free to merge it, but I'd rather do so only with better proof to back the change - lacking that, I'd rather keep the code simple.

Davide D'Alto
May 27, 2016, 11:04 AM

> P.S. I don't mean to sound too negative. Feel free to merge it, but I'd rather do so only with better proof to back the change - lacking that, I'd rather keep the code simple.

+1

Gunnar Morling
May 27, 2016, 11:19 AM

The analogy with a List implementation isn't a good argument to do this on hashmaps as lists don't use buckets and don't aim to reuse buckets.

What I mean is driving map allocation based on the size of a given list. Say you have a list of 12 things and want to allocate a map from name to "thing" for these 12. Then allocating the map with list.getSize() is most of the time not what you wanted.

Guillaume Smet
May 27, 2016, 12:21 PM

Didn't think it would be that controversial, considering it was mentioned to me in a review.

By the way, that's the logic used in HashMap#putAll when the initial table is empty so that sounds like something we should do.

Anyway, closing.

Assignee

Guillaume Smet

Reporter

Guillaume Smet

Labels

None

Feedback Requested

None

Feedback Requested By

None

backPortable

None

Suitable for new contributors

None

backportDecision

None

backportReEvaluate

None

Fix versions

Priority

Minor
Configure