retrieve ngrams in solr for a particular word - solr

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
<solrQueryParser defaultOperator="OR" />
</analyzer>
</fieldType>
I am using the ShingleFilterFactory to create ngrams. Now i want to retrive all the ngrams for a particular word.
Suppose i entered "night" then i want all the ngrams with the word night.
right now i am getting the only the top results from all the ngrams from my documents with the below query:
http://localhost/solr/admin/luke?fl=text&numTerms=50000&wt=json

Related

Solr filter for apostrophe's - allow search for both with and without apostophe

Using Solr 9.
I'd like the same results to return for the terms
Lowe's
as well as
Lowes
I can't seem to find the correct combination with this filter:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.LowerCaseFilterFactory" />
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
</analyzer>
</fieldType>
When testing in Solr's analyzer, I would expect that
<filter class="solr.KStemFilterFactory"/>
Would remove the s from the Lowes example in the query, thus matching Lowe in the index step.

Solr Synonym searching not working while adding spaces

As suggested in http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter , synonym searching is not working when added white spaces in the synonyms i.e. the index word is "marketing" and the synonyms added are as follows:
abc, abc xyz, marketing
my schema is as follows:
<fieldType name="String" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
also i tried adding <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> to <analyzer type="query"> but its not working.
Please suggest.
Thanks & Many Regards,
Lalit Joshi

Solr porter streaming not returning results

My schema is below. I have added PorterStemFilterFactory to schema.xml. I tried to restart it and reimport but not working:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" tokenizerFactory="solr.StandardTokenizerFactory" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

I have Added a doc to SOLR index but stopword does not apply

I have configured a SOLR installation 4.10.3 with schema.xml with the followin lines :
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have indexed the text field, and inserted some stopword in the stopword.txt file. But the configuration gives an error when I restart solr.
Please how to make it working the stopwords ?
Thank you

How to make solr synonyms fetch same results(same number and ordering) both when searching with abbreviations and their full-names

Have added the following in schema.xml:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
Synonym.txt
Attention deficit hyperactivity disorder,ADHD
Re-indexed Solr
But the number of results and the ordering is different when I search 'ADHD' and 'Attention deficit hyperactivity disorder'. Is there any more configurations that has to be done for the Solr to identify the synonym.txt and provide the same results for both the searches?
#dwhelan - the query for the search of ms looks like '((drug_facet_auto:((ms*)))OR(company_facet_auto:((ms*)))OR(disease_facet_auto:((ms*))))' and the QueryResponse : {responseHeader={status=0,QTime=15,params={facet=true,q=((drug_facet_auto:((ms*)))OR(company_facet_auto:((ms*)))OR(disease_facet_auto:((ms*)))),facet.limit=100,facet.field=[drug_facet, company_facet, disease_facet],wt=javabin,rows=0,version=2}},response={numFound=0,start=0,docs=[]},facet_counts={facet_queries={{!label='Last 24 hours'}publishdate:[NOW/HOUR-24HOURS TO NOW/HOUR+1HOUR]=0,{!label='Last 7 days'}publishdate:[NOW/DAY-7DAYS TO NOW/DAY+1DAY]=0,{!label='Last 30 days'}publishdate:[NOW/DAY-1MONTH TO NOW/DAY+1DAY]=0,{!label='Last year'}publishdate:[NOW/DAY-1YEAR TO NOW/DAY+1DAY]=0},facet_fields={drug_facet={},company_facet={},disease_facet={}},facet_dates={},facet_ranges={}},highlighting={},spellcheck={suggestions={correctlySpelled=false}}}

Resources