Wednesday 28 June 2017

Endeca | Configure stop words

What is meaning of stop word in Endeca ?

Stop words are words that are ignored if an application user includes them as part of a search.Typically, common words like the, and, a and so on are included in the stop word list.

How to add stop words in CAS based application ?

Step 1 : Open application specific stop word configuration file.

This file is located at <Application Directory>/config/mdex/<Application_Name>.stop_words.xml

For example : For store application installed at /opt/app/endeca/apps/ location file will be /opt/app/endeca/apps/Store/config/mdex/Store.stop_words.xml  

Step 2 : Add stop words.
By Default there is no stop word configured.
============================================================
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE STOP_WORDS SYSTEM "stop_words.dtd">
<STOP_WORDS>

  <STOP_WORD>of</STOP_WORD>
  <STOP_WORD>the</STOP_WORD>
  <STOP_WORD>how</STOP_WORD>
  <STOP_WORD>when</STOP_WORD>
</STOP_WORDS> 
============================================================
Step 3 : Run baseline. 

Important Points :
1. Words added to the stop word list are not expanded by other Endeca features like stemming and thesaurus. That means that if you set the word item as a stop word, its plural form items will not be marked automatically as a stop word. If you want both forms to be on the stop word list, you must add them individually. 

2. Stop words must be single words only, and cannot contain any non-searchable characters. If more than one word is entered as a stop word, neither the individual words nor the combined phrase will act as a stop word. Non-searchable characters within a stop word will also cause this behavior. Entering “full-book” as a stop word acts just as if you had entered “full book”, and does not have any effect on searches.

No comments:

Post a Comment