How to query in "Hibernate Search" fuzzy mode with FuzzyContext.withEditDistanceUpTo() with more than 2 "changes"?

Description

org.hibernate.search.query.dsl.FuzzyContext.withEditDistanceUpTo() only allows a maximaum of 2 “changes”. How to make a query with more “changes”, for example 10?

Environment

None

Activity

Show:
Yoann Rodière
April 8, 2020, 3:39 PM

Please use discourse or stackoverflow for usage questions, and only create tickets for feature requests with a documented use cases or bug reports.

As to your question: it's not possible. Lucene only supports a Levenshtein distance of at most 2. That's hardcoded into the algorithm. See the javadoc of org.apache.lucene.search.FuzzyQuery#FuzzyQuery(org.apache.lucene.index.Term, int, int, int, boolean) and org.apache.lucene.util.automaton.LevenshteinAutomata#MAXIMUM_SUPPORTED_DISTANCE.
There may be other ways to achieve what you are trying to do, however. Post a question explaining your problem in more details, and maybe someone will suggest a solution.

TNT2k
April 23, 2020, 10:28 AM

I searched around a bit, and found a “SlowFuzzyQuery” in a sandbox version of lucene (maven, javadoc).

This SlowFuzzyQuery is able to do, what I need, but is a bit slower (which is not a factor for me). May you implement this class into your QueryBuilder (org.hibernate.search.query.dsl.TermContext.QueryBuilder). I would highly appreciate this, because it would give me the possibility to uses custom analyzers, what I can not do without the QueryBuilder.
Example code for current usage:

 

Yoann Rodière
April 23, 2020, 10:46 AM

I don't think we will add this to the DSL, as I doubt we can rely on the stability and overall quality of "sandbox" code. We actually explicitly exclude this artifact from our transitive dependencies.

That being said, if you want to rely on that code, you can still roll out your own "slow fuzzy query parser":

And get the custom analyzer from Hibernate Search:

(Or re-build the analyzer manually using org.apache.lucene.analysis.custom.CustomAnalyzer#builder())

Then just do this to create the Lucene query:

Assignee

Yoann Rodière

Reporter

TNT2k

Labels

None

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Affects versions

Priority

Minor
Configure