xml - Solr filters are them good? -


do think filters french search?

<fieldtype name="text" class="solr.textfield" positionincrementgap="100">   <analyzer type="index">     <tokenizer class="solr.whitespacetokenizerfactory"/>     <!--      in example, use synonyms @ query time             <filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/>           -->     <!--      case insensitive stop word removal.               add enablepositionincrements=true in both index , query               analyzers leave 'gap' more accurate phrase queries.     -->     <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>     <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="1" catenatenumbers="1" catenateall="0" splitoncasechange="1"/>     <filter class="solr.lowercasefilterfactory"/>     <filter class="solr.asciifoldingfilterfactory"/>     <filter class="solr.snowballporterfilterfactory" language="french" protected="protwords.txt"/>     <filter class="solr.removeduplicatestokenfilterfactory"/>     <filter class="solr.elisionfilterfactory" />           </analyzer>   <analyzer type="query">     <tokenizer class="solr.whitespacetokenizerfactory"/>     <filter class="solr.lowercasefilterfactory"/>     <filter class="solr.asciifoldingfilterfactory"/>     <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/>     <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>     <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="0" catenatenumbers="0" catenateall="0" splitoncasechange="1"/>     <filter class="solr.snowballporterfilterfactory" language="french" protected="protwords.txt"/>     <filter class="solr.removeduplicatestokenfilterfactory"/>     <filter class="solr.elisionfilterfactory" />   </analyzer> </fieldtype> 

i have problems "electricitré" returns 6 occurences, when "electricite" returns 9 occurences.

  1. you can use solr admin page understand why electricitré , electricite doesn't give same results:

http://exemple.com:8983/solr/#/yourcorename/analysis?analysis.fieldvalue=electricit%c3%a9+electricite&analysis.query=electricit%c3%a9+electricite&analysis.fieldtype=text&verbose_output=1

here suppose due typo: electricitré instead of electricité without r?

  1. solr advises use synonyms @ index time:

keep in mind while synonymfilter happily work synonyms containing multiple words (ie: "sea biscuit, sea biscit, seabiscuit") recommended approach dealing synonyms this, expand synonym when indexing. because there 2 potential issues can arrise @ query time:

  1. the lucene queryparser tokenizes on white space before giving text analyzer, if person searches words sea biscit analyzer given words "sea" , "biscit" seperately, , not know match synonym.
  2. phrase searching (ie: "sea biscit") cause queryparser pass entire string analyzer, if synonymfilter configured expand synonyms, when queryparser gets resulting list of tokens analyzer, construct multiphrasequery not have desired effect. because of limited mechanism available analyzer indicate 2 terms occupy same position: there no way indicate "phrase" occupies same position term. our example resulting multiphrasequery "(sea | sea | seabiscuit) (biscuit | biscit)" not match simple case of "seabiscuit" occuring in document

even when aren't worried multi-word synonyms, idf differences still make index time synonyms idea. consider following scenario:

  • an index "text" field, @ query time uses synonymfilter synonym tv, televesion , expand="true"
  • many thousands of documents containing term "text:tv"
  • a few hundred documents containing term "text:television"

a query text:tv expand (text:tv text:television) , lower docfreq text:television give documents match "television" higher score docs match "tv" comparably -- may counter intuitive client. index time expansion (or reduction) result in same idf documents regardless of term original text contained.

  1. solr advises use elisionfilter before worddelimiterfilter:

note: best use elisionfilter before worddelimiterfilter. prevent slow phrase queries.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -