Consider using _meta field to store the fully qualified class name

Description

To make documents smaller, we could have smaller type name and use the _meta field of the mapping to store the fqcn.

The main drawback is that we need the user to provide the shorter type name (it needs to be unique).

Worth it?

Environment

None

Activity

Show:
Sanne Grinovero
April 18, 2016, 12:01 PM

The query translator would take a term query targetting the special class field and convert it into a query using the name as given through the user's strategy. Do you see any general issue stopping us from doing so?

I don't see any specific issue, just pointing out that you'd need to store the type field both within _meta and also somewhere else to make it possible to filter on the type. If the goal of this proposal is to make the JSON Document smaller, as the JIRA description suggests, then the size benefit is moot as we'd actually be adding the type twice. I could me mistaken on what the goal is though, some clarification would help.

BTW in some cases the backend will need to add this restriction; see for example org.hibernate.search.backend.impl.lucene.works.DeleteWorkExecutor... not sure if the "query translator" is the right level to implement this.

Gunnar Morling
April 18, 2016, 12:17 PM

The _meta field lives at the level of the type mapping, so it'd be once the size of that plus n * size of the shorter name, where n is the number of the documents of that type.

BTW in some cases the backend will need to add this restriction...

Yes, it'd have to be taken into account wherever a restriction on the type is applied, e.g. in ElasticsearchIndexWorkVisitor#visitDeleteWork() etc.

Sanne Grinovero
April 18, 2016, 12:34 PM

Ok I understand the intent better now. I don't like the idea of automatic "shortening" the names we need, that introduces several opportunities for bugs and quite some trouble with schema evolution over time.

+1 to allow the user explicitly customize the user name, -1 to try such an optimisation behind user's back.. we'd only save a handful of bytes.

Both Infinispan and JGroups use "type names" and "cluster names" which have to repeatedly be sent over the network to tag each and every UDP packet; the idea of setting up an initial hand-shake to agree on an encoding and then switch to transmitting a single byte (or just a couple) is a recurrent topic when it comes to performance optimisation, but there are just too many complexities not making it worth it, not least to make it a nightmare to identify when such a complexity is biting (vs something else going wrong).
There's a similar issue with "table name" strings in OGM's keys.

We can discuss this at the next meeting if someone things it can be done safely in the case of Search, but trust me that it's complex so I'm not sure if attempting that is worth our time.

Emmanuel Bernard
April 19, 2016, 6:01 AM

To cut short endless arguments, hesitations and revisiting, I'll close this one and I opened a follow up focused on the TypeNamingStrategy

Yoann Rodière
May 27, 2020, 9:06 AM

No longer relevant in Search 6: we store the entity name, which is shorter, and there's even an option to not store anything at all and rely on index names (see type name mapping).
Closing.

Assignee

Yoann Rodière

Reporter

Emmanuel Bernard

Labels

None

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Affects versions

Priority

Major
Configure