Add support for sharded collections in mongodb

Description

getEntityManager().merge(entity); will result in an error in a cluster environment, because the update statement is going with {upsert: true }.
With sharded collections need to specify whole shard key in order to update a record. It isn't enough to specify only primary key.
I'm not sure, perhaps you already have an option for this situation, but with lack of documentation I wasn't be able to find it.
May be need to create a new annotation to define a shard key for the entity, or make a parameter for the persistence context to disable upsert:true... Need to do something...

Environment

Glassfish 4.0 (Windows8), Mongo 2.6.4 (Fedora)

Activity

Show:
Gunnar Morling
October 20, 2014, 9:17 AM

Hi Vadim, thanks for opening this issue!

Atm. Hibernate OGM does not yet explicitly support sharded collections in MongoDB, we also don't have any specific tests for that. But it's definitely something we should add. An annotation (or more generically, an "option" which may be given via an annotation or via the fluent configuration API) for specifying the shard key of an entity seems very reasonable. It would also allow to enable sharding of the collection based on that key during schema initialization. Would you be interested in helping out with implementing such feature?

Btw. as a workaround (whether it works or not may depend on your id generation strategy) you may try to use a compound id for your entity which comprises the actual id as well as your shard key. This should add the shard key to the "where clause" of updates.

Gunnar Morling
October 21, 2014, 7:57 AM

Such workaround should work, in case if you have already implemented @EmbeddedId or @IdClass in OGM

Yes, these are supported already.

May I also state my opinion about support of sharded collections?

Absolutely, any feedback is more than welcome!

Without sharded collections MongoDB and others such as Cassandra - almost useless.

I wouldn't exactly say that these stores are useless without sharded collections, e.g. MongoDB adds much value by its support for nested/structured data, no need for having a physical schema in the datastore etc. But of course I understand that sharding is an important feature for many, and we will surely add proper support for it. Really it's just a matter of resources and prioritizing. But the good thing is, it's all open source, so you can push for a certain feature just by sending a pull request

Assignee

Unassigned

Reporter

VadimsZ

Labels

Feedback Requested

None

Feedback Requested By

None

backPortable

None

Suitable for new contributors

None

Pull Request

None

backportDecision

None

backportReEvaluate

None

Components

Fix versions

Affects versions

Priority

Major
Configure