We're updating the issue view to help you get more done. 

Update documentation regarding the indexing of e-mail addresses

Description

In the ref guide (and also the web-site which has copied this bit) it says:

The standard tokenizer splits words at punctuation characters and hyphens while keeping email addresses and internet hostnames intact.

That used to be the case traditionally, but the behavior has changed on the Lucene side and e-mail addresses will be tokenized actually. In the SO answer I recommended to use ClassicTokenizer (which now has the traditional behavior), we either should recommend that or show a custom tokenizer with the required behavior.

Environment

None

Status

Assignee

Gunnar Morling

Reporter

Gunnar Morling

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Affects versions

5.5.0.Final

Priority

Minor