Solr not returning the exact element - solr

Using Solr 7.7.3
I have an element with the label:"alpha-ravi"
and when I search in solr label:"alpha" its returning the element with the label "alpha-ravi"
when looking at the solr doc, it should not return this element.
can anyone explain why this behavior ?

If you want to retrieve the exact results (i.e return docs with "alpha-ravi" only if the user types the exact "alpha-ravi" in the search), then I would suggest you could go with the Keyword tokenizer (solr.KeywordTokenizerFactory). This tokenizer would treat the entire "alpha-ravi" as a single token and thus, will not return partial results if there's a match for "alpha" or "ravi".
For example: in your schema.xml file you should add something like (configure the various filter chains as per your need)
<fieldType name="single_token_string" class="solr.TextField" sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
And then you can use this fieldType in the same schema.xml (referencing the KeywordTokenizer we just defined)
<field name="myField" type="single_token_string" indexed="true" stored="true" />
By default, Solr uses the StandardTokenizer and thus, splits "alpha-ravi" on that hyphen into multiple tokens (thus, matching "alpha" and "ravi").
Also, as an alternative you could run a query with a phrase as well (which will not be tokenized on spaces/delimiters). Possibly something likehttp:localhost:8983/solr/...fq=label:"alpha-ravi"
Hope that helps. All the best!

Related

How to get an "ends with" search in Solr 4.8.1?

I have a document, indexed on Solr, which contains this field:
{
"manufacturerSkuEndsWith": [
"DU351118DR0"
]
}
My goal is to get an "ends with" search on the manufacturerSkuEndsWith field. For example, the following queries should match the value above: DR0, 8DR0, 18DR0, 118DR0... but these queries should NOT match: DU35, 118DR, 118...
My problem is that the query 118 matches that document, even though DU351118DR0 does not end with 118.
My Solr & Lucene version is 4.8.1. I've found out that in this version the side="back" for the EdgeNGramTokenizer is not supported anymore: LUCENE-3907. In this thread, they are suggesting to use a ReverseStringFilter to get a behaviour similar to an EdgeNGramTokenizer with side="back", so this is how I configured the manufacturerSkuEndsWith field in my schema.xml:
<field indexed="true" multiValued="true" name="manufacturerSkuEndsWith" stored="true" type="smccTextReversedNGram"/>
<copyField dest="manufacturerSkuEndsWith" source="ManufacturerSku"/>
<fieldType class="solr.TextField" name="smccTextReversedNGram" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" maxGramSize="10" minGramSize="3"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReverseStringFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReverseStringFilterFactory"/>
</analyzer>
</fieldType>
but this configuration does not perform an "ends with" search:
How can I get that type of search, instead?
You're using the NGramTokenizer and not the EdgeNGramFilter as shown in the examples. The NgramTokenizer will generate tokens from inside the string as well, and not just from the edge.
To get the behavior you're looking for you have to have a KeywordTokenizer (which will keep the input as a single token), and then use the ReverseStringFilter to reverse it - before using the EdgeNGramFilter to generate strings from the start of the now reversed string:
foo -> oof -> o, oo, oof
You can then either run these through the reversed string filter again to get the "correct" versions indexed:
-> o, oo, foo
.. or you can do as you've done in your field, and reverse the input string instead:
foo -> oof -> matches the oof token

Solr suggest exact match

I am trying to make solr return exact match on suggestion, ex:
spellcheck.q=tota does return total in results but
spellcheck.q=total does not return total in results.
I am using this field for suggestions:
<fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Any idea how to make Solr returns exact matches on suggest??
You are using the SpellChecker component, which, as the name indicate, is meant for spellchecking. It returns suggestions for how entry the should be spelled. When the word is spelled correct (which equals a exact match) it returns nothing, which is the reason you dont see the word in the list.
Since Solr 4.7 a new Suggestion component has been added, which is actually implemented for autosuggestion and yields the results you expect.
can you try with this
<fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="front"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="back"/>
</analyzer>
</fieldType>
As mentionned in this wiki page: https://cwiki.apache.org/confluence/display/solr/Suggester
To be used as the basis for a suggestion, the field must be stored.
Make sure your field is stored.
You field isn't stored so it is returning the data crunched by your indexer.
Your problem came because you used the old suggest component based on the spellcheck component (I suppose you used a version of solr before 5).
With the old spellcheck/suggest, if the word match it is not return in the response!
Test with the solr.suggestComponent (if present in your version).
see: https://cwiki.apache.org/confluence/display/solr/Suggester

Solr difficulty getting fuzzy search with multiple terms

Suppose someone's name is Alessia Keeling. I'm having difficulty getting the following queries to work
q=Alessia Keeling returns a result
q=Alessia returns a result
q=Alessia Keel returns a result
however,
q=Alessia Keeli and q=Alessia Keelin returns no results
I've tried quite a few things here in my schema.xml file, but I don't have much METHOD to my MADNESS.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
</fieldType>
Solr Admin Analyzer shows that it will match both "Alessia" and various forms of "Keeling", but Sunspot is still returning no results.
Edit 1
Here is console testing
(byebug) Sunspot.commit
(byebug) Sunspot.index
(byebug) User.search {|q| q.fulltext "Alessia Keeling" }.hits
[#<Sunspot::Search::Hit:User 4>]
(byebug) User.search {|q| q.fulltext "Alessia Keelin" }.hits
[]
Edit 2
I was finally able to get somewhere. I looked in some of my log files and noticed that the call my app was making to solr was using the query string
"http://localhost:8981/solr/select?fq=type%3AUser&q=Eli+Donnelly+I&fl=%2A+score&qf=email+first_name_text+last_name_text+username_text+name_text+description_text&defType=dismax&start=0&rows=30&debugQuery=true
This printed out some useful information, most useful being "parsedQuery" I was able to see that another field was conflicting. I have another field that handles emails and in this latter case where my query string was "Eli Donnely I", the sole letter token "I" was breaking the query because of the email field. Adding a length filter fixed it.
I've tried this from the Solr side with an example core, and the match is returned with the NGram filter while indexing. You might want to check the server side logs to see that you're actually reindexing, at least.
The field definition is as follows:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
data.json:
[{"id": 1, "text": "Alessia Keeling"}, {"id": 2, "text": "Alessia Fubar"}]
Populate it with data:
curl http://localhost:8983/solr/collection1/update\?commit\=true --data-binary #data.json -H 'Content-type:application/json'
Searching:
GET http://localhost:8983/solr/collection1/select\?q\=alessia%20keelin\&q.op\=AND
[..]
<result name="response" numFound="1" start="0"><doc><int name="id">1</int><str name="text">Alessia Keeling</str><long name="_version_">1473248002863792128</long></doc> </result>
.. which returns the promised document, while keeping the non-matching document out of the result.
As mentioned in the comment, you need to switch the EdgeNGramFilterFactory to the index instead of the query.
For me, this has apparently worked. Edit the file schema.xml as below:
<?xml version="1.0" encoding="UTF-8"?>
<schema name="sunspot" version="1.0">
... (other stuff)
<solrQueryParser defaultOperator="AND|OR"/>
... (other stuff)
</schema>
Before, I have the defaultOperator only configured as AND, after I changed it, searches were getting more flexible.
Also, I'd suggest giving a look at this page.

partial word search in solr example: sarvesh , i want search like rves

examples:Beautiful
search based: auti...
I would like to search with only part of a word, not the whole word.
For example when I search auti only the middle 3 letters ,not the whole word.I am not getting results : For the moment I am using the search api with apache solr (and perhaps views).
Any suggestions please?
I am using this one
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10"/>
</analyzer>
</fieldType>
You can use wildcard query.
In your example above, you should prepend and append your search terms with an asterix, so if someone searches for auti, the query you send to server will be auti
This should pull all results with all words that contain the word auti within them.
http://www.solrtutorial.com/solr-query-syntax.html
Now since you wanna search for sub-strings inside words, you can add side="back" to your definition, and that should help you achieve your goal.
So your fieldtype definition will look like this:
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10" side="front" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10" side="back" />
</analyzer>
</fieldType>

Solr filter factory syntax not working

So I am attempting to have a custom field in my Solr schema that is filtered and processed a certain way but it doesn't seem to be working.
<fieldType name="removeWhitespace" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="\s" replacement="" replace="all" />
</analyzer>
</fieldType>
<field name="whiteSpaceRmved" type="removeWhitespace" stored="true" indexed="true"/>
<copyField source="original" dest="whiteSpaceRmved"/>
Basically, if I have a field like,
Hello World
I want to have that field, and a new field name that looks like,
HelloWorld
But when I try it, it copies the field, but doesn't change it in any way. Any ideas?
You need to move the tokenizer <tokenizer class="solr.StandardTokenizerFactory" />to the end of your analyzer chain. Currently, it is breaking the field values into tokens before you are removing whitespace. And actually since you are removing whitespace, you might not even need a tokenizer, since it looks like you want to store the values as strings really.
You should use KeywordTokenizer, which does no actual tokenizing, so the entire input string is preserved as a single token
<fieldType name="removeWhitespace" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="(\s)" replacement="" replace="all"
/>
</analyzer>
</fieldType>

Resources