Defining analyzer or any index.* properties in elasticsearch.yml is deprecated and will not work in Elasticsearch 5 (next version after 2.3).
We should move away from using this approach and instead incorporate the Analyzer definition during the create index phase.
Here is the API for it https://www.elastic.co/guide/en/elasticsearch/reference/2.3/indices-update-settings.html#update-settings-analysis
This will work for all the analyzer def based on reasonable default implementations of Lucene / Elasticsearch. Each tokenizer and filter and char set can be given a name.
One can also pass a fully qualified class name instead of the short name (to be verified)
What about custom implementations of Tokenizer / Filter. The natural way in Elasticsearch is to write and deploy a plugin which contains a small implementation enlisting the tokenizers or filter by name and the actual implementations in a Jar. The main gotcha is that implementation classes must implement Elasticsearch interfaces.
How far should we help users deploy their custom analyzer implementations :
build the plugin distro?
check the presence of the named analyzers or components (which ES API)?
change Analyzerdef to adopt a string based name solution like Elasticsearch?
Continuing the discussion from (which is a duplicate): this issue will be addressed in 5.6.0-CR1 only if we have enough time. It's a low-priority issue for now.