To make documents smaller, we could have smaller type name and use the _meta field of the mapping to store the fqcn.
The main drawback is that we need the user to provide the shorter type name (it needs to be unique).
The query translator would take a term query targetting the special class field and convert it into a query using the name as given through the user's strategy. Do you see any general issue stopping us from doing so?
I don't see any specific issue, just pointing out that you'd need to store the type field both within _meta and also somewhere else to make it possible to filter on the type. If the goal of this proposal is to make the JSON Document smaller, as the JIRA description suggests, then the size benefit is moot as we'd actually be adding the type twice. I could me mistaken on what the goal is though, some clarification would help.
BTW in some cases the backend will need to add this restriction; see for example org.hibernate.search.backend.impl.lucene.works.DeleteWorkExecutor... not sure if the "query translator" is the right level to implement this.
The _meta field lives at the level of the type mapping, so it'd be once the size of that plus n * size of the shorter name, where n is the number of the documents of that type.
BTW in some cases the backend will need to add this restriction...
Yes, it'd have to be taken into account wherever a restriction on the type is applied, e.g. in ElasticsearchIndexWorkVisitor#visitDeleteWork() etc.
Ok I understand the intent better now. I don't like the idea of automatic "shortening" the names we need, that introduces several opportunities for bugs and quite some trouble with schema evolution over time.
+1 to allow the user explicitly customize the user name, -1 to try such an optimisation behind user's back.. we'd only save a handful of bytes.
Both Infinispan and JGroups use "type names" and "cluster names" which have to repeatedly be sent over the network to tag each and every UDP packet; the idea of setting up an initial hand-shake to agree on an encoding and then switch to transmitting a single byte (or just a couple) is a recurrent topic when it comes to performance optimisation, but there are just too many complexities not making it worth it, not least to make it a nightmare to identify when such a complexity is biting (vs something else going wrong).
There's a similar issue with "table name" strings in OGM's keys.
We can discuss this at the next meeting if someone things it can be done safely in the case of Search, but trust me that it's complex so I'm not sure if attempting that is worth our time.
To cut short endless arguments, hesitations and revisiting, I'll close this one and I opened a follow up focused on the TypeNamingStrategy