Compound Word Tokenfilter

Token filters that allow to decompose compound words. There are two types available: dictionary_decompounder and hyphenation_decompounder.

The following are settings that can be set for a compound word token filter type:

Setting Description
word_list A list of words to use.
word_list_path A path (either relative to config location, or absolute) to a list of words.

Here is an example:

index :
    analysis :
        analyzer :·
            myAnalyzer2 :
                type : custom
                tokenizer : standard
                filter : [myTokenFilter1, myTokenFilter2]
        filter :
            myTokenFilter1 :
                type : dictionary_decompounder
                word_list: [one, two, three]
            myTokenFilter2 :
                type : hyphenation_decompounder
                word_list_path: path/to/words.txt