Uploaded image for project: 'Hibernate Search'
  1. HSEARCH-2725

Search queries on multiple-node setup with Elasticsearch + dynamic sharding may ignore some indexes

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.6.0.Final, 5.6.1.Final, 5.7.0.Final, 5.8.0.Beta1, 5.8.0.Beta2
    • Fix Version/s: 6.0.0.Beta-backlog
    • Component/s: backend-elasticsearch
    • Labels:
      None
    • Bug Testcase Reminder (view):

      Bug reports should generally be accompanied by a test case!

      Description

      See HSEARCH-2674 Open : it's approximately the same problem.
      When you have multiple Hibernate Search nodes, all of them targeting the same indexes in a single Elasticsearch cluster, you may end up having some "dynamic shards" (i.e. ES indexes) ignored when querying, simply because each Hibernate Search node is only aware of the shards it created itself.

      One solution would be to not ever mention the index names when querying (except when using explicit shard filtering), and only filter by type.
      We'd have to check how well this performs, though: ideally Elasticsearch would first narrow down the list of indexes to query based on all the indexes that mention the targeted types in their metadata, but I'm not sure it does.
      => Actually no, we cannot do this, because the list of targeted indexes may or may not have been filtered by the ShardIdentifierProvider depending on filters, and we have absolutely no clue as to whether there has been some filtering. So we cannot arbitrarily decide to make the query target every index... I'll mention this issue in HSEARCH-2674 Open , which I'm afraid will have to be fixed first.
      => Also, I tried a bit to execute queries on all indexes, but on a specific set of types, and it seems Elasticsearch actually executes the request against all indexes, so the performance gain (which was the point of dynamic sharding in the first place) is lost, and we may even have worse performance than without sharding... Maybe we should consider using Elasticsearch shards + custom routing instead of creating multiple indices when doing dynamic sharding in the Elasticsearch case? (see HSEARCH-2634 Open )

        Attachments

          Issue links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                yrodiere Yoann Rodière
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: