Friday 14 July 2017

Endeca | Configure Stemming in CAS based application

Stemming : Stemming broadens search results to include root words and variants of root words.

For example, search results for the word truck will include the derivation trucks, while a search for trucks will also include its word root truck.

Steps to Configure Stemming : You can configure stemming manually in stemming.xml file. This file is located in <Endeca_App_Directory>/config/mdex directory. You can manually update this file. If stemming.xml does not contain an entry for a language, stemming is not enabled for that language and the default analysis is applied to that language.

In stemming.xml, the entry for a language is contained in a separate <STEMMING> element. Each subelement in the <STEMMING> element begins with STEM_language-code, where language-code identifies the language; for example, STEM_DE for German.

The subelements specify the following :
  • Whether stemming is to be performed on that language.
  • Whether a static wordforms file is to be used.
  • Whether compound matching is to be performed.
For example, the following entry, for American English, specifies that stemming is to be performed using a static wordforms file, and that compound matching is not to be performed.
================================================================

<STEMMING>
<STEM_EN_US ENABLE="TRUE"
USE_COMPOUND_MATCHING="FALSE"
USE_STATIC_WORDFORMS="TRUE"/>
</STEMMING>

================================================================

No comments:

Post a Comment