Facet Use Cases

Description

I saw the request for Facet Use Cases and thought it might nice to have a container for just that purpose so that everyone can contribute their use cases.

The types of facets we are using (through Bobo Browse, but: happy to switch to HSearch when ready):

Simple

  • Facet on a single value.

  • Ability to specify minimum number of keys for facet to show up in the facet result. Use case: if you have a faceted field that can have only two values: true or false, it doesn't make any sense for the facet to be retrieved if all values are true.

  • Ability to specify minimum number of documents for a facet key to show up in the facet result. Use case: you sometimes want to avoid long sets of facet values with only 1 or 2 related documents when the rest has e.g. thousands

Drilldown

E.g. /United States/Texas/Dallas

  • The ability for a facet value to contain a tree branch of values.
    Use case:

  • Ability to select United States and have the facet result show all the states of the US (assuming there are documents for all states)

  • Ability to select Texas and have the facet result show all the cities of Texas (assuming there are documents for all cities)

Collection

  • Ability to facet on a field that is a collection and can have multiple values

  • Ability to use the selected facets as AND: only show facets that match documents that have all values

  • Ability to use the selected facets as OR: only show facets that match documents that one or more values

  • Ability to specify minimum number of keys for facet to show up in the facet result.

  • Ability to specify minimum number of documents for a facet key to show up in the facet result
    Use case
    A user selects a number of skills and we can select x number of skills with the documents being selected with OR based on best results and with AND based on exact matches.

PredefinedRange

  • Ability to get facet keys back that match a predefined range.
    Use case: documents with salary x show up in as facets with the predefined ranges and a document count per range

Geolocation

Use case: a user enters a longitude and latitude and the documents are selected based on a radius. The facets show various options to expand or contract the radius: < 5km, 5-10km, > 10km

Environment

None

Activity

Show:
Hardy Ferentschik
March 5, 2014, 9:42 PM

thanks for this nice feature breakdown. We definitely take it into consideration.

Hardy Ferentschik
January 30, 2015, 2:50 PM

Hi , I finally got around working a bit more with facets and would like to get some feedback. I am working on making use of the native faceting capabilities compared to our home-grown FieldCache collector approach we are having now.

With native Lucene faceting I basically have two options. Either dynamic faceting (just using SortedSetDocValuesFacetField and NumericDocValuesField; see also [1],[2]) or faceting via the taxonomy index. Even though the dynamic approach is a slower (~25%), it has the advantage that we don't have to maintain a second index. I am working at a change to implement faceting based on dynamic Lucene faceting as a first step. Most of your points above can be covered this way.

However, there is one use case which is not solvable via dynamic indexing, that is category trees (which you called drill down above). For that you need to use the taxonomy index. That said, I am wondering how you would configure the required category tree. For example using pure Lucene API you would do:


In this case the publish date is broken down into year, month and day. How do you configure that if your entity has:

Somehow you would need a contract which tells you how to build the tree given a property or more likely given a whole instance (in the case the multiple entity properties contribute to the path). You need something like:

In the example above you probably would like to place @FacetCategoryBuilder directly on publishDate and defaulting the name, but in a more general case you would need to be able to place it on type level as well.

Anyways, this are just some initial thoughts of mine around this. I was wondering whether you have thought about this as well? (mind you with my current work we won't be able to do this anyways, but I want to see if and how it would be possible, if we had the taxonomy index.)

[1] http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html
[2] http://www.norconex.com/facets-with-lucene/

Marc Schipperheyn
January 30, 2015, 7:01 PM

Ok, yeah. We don't actually use it that much although I did implement it (using Bobo). Bobo is also dynamic actually, so I'm wondering how they did it if you say that it's not possible dynamically. Perhaps they build some caches in memory.

Anyways. I would expect to configure the cache at field level by specifying a separator, defaulting perhaps to "/".
Storing it in the index would be simply:

California/San Francisco

As we did it, was to store IDs that would be converted back to labels during the query process. Lucene has native grouping functionality that perhaps could be useful in grouping values

Marc Schipperheyn
February 4, 2015, 3:33 PM

Assignee

Unassigned

Reporter

Marc Schipperheyn

Labels

Suitable for new contributors

Yes, likely

Pull Request

None

Feedback Requested

None

Components

Fix versions

Priority

Minor
Configure