elasticsearch - guide - Compound Word Token Filter

Plugin

Token filters that allow to decompose compound words. There are two types available: dictionary_decompounder and hyphenation_decompounder.

The following are settings that can be set for a compound word token filter type:

Setting	Description
`word_list`	A list of words to use.
`word_list_path`	A path (either relative to `config` location, or absolute) to a list of words.

Here is an example:

index :
    analysis :
        analyzer :·
            myAnalyzer2 :
                type : custom
                tokenizer : standard
                filter : [myTokenFilter1, myTokenFilter2]
        filter :
            myTokenFilter1 :
                type : dictionary_decompounder
                word_list: [one, two, three]
            myTokenFilter2 :
                type : hyphenation_decompounder
                word_list_path: path/to/words.txt

elasticsearch. guide

Compound Word Token Filter

Guide

Index Modules

Analysis

Analyzers

Tokenizers

Token Filter

Char Filter

Plugin