Solr suggest exact match - solr

I am trying to make solr return exact match on suggestion, ex:
spellcheck.q=tota does return total in results but
spellcheck.q=total does not return total in results.
I am using this field for suggestions:
<fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Any idea how to make Solr returns exact matches on suggest??

You are using the SpellChecker component, which, as the name indicate, is meant for spellchecking. It returns suggestions for how entry the should be spelled. When the word is spelled correct (which equals a exact match) it returns nothing, which is the reason you dont see the word in the list.
Since Solr 4.7 a new Suggestion component has been added, which is actually implemented for autosuggestion and yields the results you expect.

can you try with this
<fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="front"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="back"/>
</analyzer>
</fieldType>

As mentionned in this wiki page: https://cwiki.apache.org/confluence/display/solr/Suggester
To be used as the basis for a suggestion, the field must be stored.
Make sure your field is stored.
You field isn't stored so it is returning the data crunched by your indexer.

Your problem came because you used the old suggest component based on the spellcheck component (I suppose you used a version of solr before 5).
With the old spellcheck/suggest, if the word match it is not return in the response!
Test with the solr.suggestComponent (if present in your version).
see: https://cwiki.apache.org/confluence/display/solr/Suggester

Related

Solr not returning the exact element

Using Solr 7.7.3
I have an element with the label:"alpha-ravi"
and when I search in solr label:"alpha" its returning the element with the label "alpha-ravi"
when looking at the solr doc, it should not return this element.
can anyone explain why this behavior ?
If you want to retrieve the exact results (i.e return docs with "alpha-ravi" only if the user types the exact "alpha-ravi" in the search), then I would suggest you could go with the Keyword tokenizer (solr.KeywordTokenizerFactory). This tokenizer would treat the entire "alpha-ravi" as a single token and thus, will not return partial results if there's a match for "alpha" or "ravi".
For example: in your schema.xml file you should add something like (configure the various filter chains as per your need)
<fieldType name="single_token_string" class="solr.TextField" sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
And then you can use this fieldType in the same schema.xml (referencing the KeywordTokenizer we just defined)
<field name="myField" type="single_token_string" indexed="true" stored="true" />
By default, Solr uses the StandardTokenizer and thus, splits "alpha-ravi" on that hyphen into multiple tokens (thus, matching "alpha" and "ravi").
Also, as an alternative you could run a query with a phrase as well (which will not be tokenized on spaces/delimiters). Possibly something likehttp:localhost:8983/solr/...fq=label:"alpha-ravi"
Hope that helps. All the best!

No response with query string containing whitespace in SOLR autocomplete

I am trying to use SOLR autocomplete feature, Basically once a user types 3 characters, I want to show response with every character typed. SOLR version is 6.5.1. Below is the configuration I am using.
<fieldType name="searchFieldType" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have a sample index which is having field values as below.
"ta", "taj", "tajbacd", "tajabcd", "taj cbad","taj abcd", "taj bcad","taj abcd cbad", "taj abcd abcd","taj abcd bacd", "abcd taj","abcd ta", "random string"
When I am seraching for "taj", I am getting expected results But if I search for "taj ", or "taj ab", Solr is not returning any results. Can you guys help me here. I tried to use Analysis, which is showing ngram is found, below is the screenshot of the same.
So, I read your question too fast...my bad.
Can you show us the requests you are using to veirfy this? Both the one working and the one not working.
By the way, one thing you can already fix, if you send only 3 chars or more, you can change your minGramSize="1" to minGramSize="3."
Well you can just easily use wildcard/partial match in this case
q={!complexphrase inOrder=true}YourField:"taj ab*"

Solr substring search yields all indexed results

To do a substring search, I have added a new fieldType - "Text" with NgramFilter.
It works fine perfectly but downside is this problem
Example
name = ['Apple','Samy','And','a']
When I do a search name:a, then all the above items gets pulled up. Even when search changes to "App". All the above items are pulled. How can I fix this issue?
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" />
</analyzer>
</fieldType>
As you can see in the analysis, both the indexed value and the query value gets parsed through the EdgeNGramFilter - meaning that it will match anything that is a substring of anything else. Add a simpler filter for querying the field, and you should be good to go.
The example from the Wiki should be usable by just copying and pasting it:
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
My initial guess was that since you don't provide two alternative definitions, Solr will use the same chain for both. Your analysis output confirms that suspicion. Try adding a analyser with type="query" to have a specific chain for querying the field (you do not want EdgeNGram both places).

Solr difficulty getting fuzzy search with multiple terms

Suppose someone's name is Alessia Keeling. I'm having difficulty getting the following queries to work
q=Alessia Keeling returns a result
q=Alessia returns a result
q=Alessia Keel returns a result
however,
q=Alessia Keeli and q=Alessia Keelin returns no results
I've tried quite a few things here in my schema.xml file, but I don't have much METHOD to my MADNESS.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
</fieldType>
Solr Admin Analyzer shows that it will match both "Alessia" and various forms of "Keeling", but Sunspot is still returning no results.
Edit 1
Here is console testing
(byebug) Sunspot.commit
(byebug) Sunspot.index
(byebug) User.search {|q| q.fulltext "Alessia Keeling" }.hits
[#<Sunspot::Search::Hit:User 4>]
(byebug) User.search {|q| q.fulltext "Alessia Keelin" }.hits
[]
Edit 2
I was finally able to get somewhere. I looked in some of my log files and noticed that the call my app was making to solr was using the query string
"http://localhost:8981/solr/select?fq=type%3AUser&q=Eli+Donnelly+I&fl=%2A+score&qf=email+first_name_text+last_name_text+username_text+name_text+description_text&defType=dismax&start=0&rows=30&debugQuery=true
This printed out some useful information, most useful being "parsedQuery" I was able to see that another field was conflicting. I have another field that handles emails and in this latter case where my query string was "Eli Donnely I", the sole letter token "I" was breaking the query because of the email field. Adding a length filter fixed it.
I've tried this from the Solr side with an example core, and the match is returned with the NGram filter while indexing. You might want to check the server side logs to see that you're actually reindexing, at least.
The field definition is as follows:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
data.json:
[{"id": 1, "text": "Alessia Keeling"}, {"id": 2, "text": "Alessia Fubar"}]
Populate it with data:
curl http://localhost:8983/solr/collection1/update\?commit\=true --data-binary #data.json -H 'Content-type:application/json'
Searching:
GET http://localhost:8983/solr/collection1/select\?q\=alessia%20keelin\&q.op\=AND
[..]
<result name="response" numFound="1" start="0"><doc><int name="id">1</int><str name="text">Alessia Keeling</str><long name="_version_">1473248002863792128</long></doc> </result>
.. which returns the promised document, while keeping the non-matching document out of the result.
As mentioned in the comment, you need to switch the EdgeNGramFilterFactory to the index instead of the query.
For me, this has apparently worked. Edit the file schema.xml as below:
<?xml version="1.0" encoding="UTF-8"?>
<schema name="sunspot" version="1.0">
... (other stuff)
<solrQueryParser defaultOperator="AND|OR"/>
... (other stuff)
</schema>
Before, I have the defaultOperator only configured as AND, after I changed it, searches were getting more flexible.
Also, I'd suggest giving a look at this page.

Autocomplete in Solr with Case-insensitive feature

I have been trying out this autocomplete feature in Solr4.7.1 using Suggester.I have configured it to display phrase suggestions also.Problem is If I type "game" I get suggestions as "game" or phrases containing "game".
But If I type "Game" no suggestion is displayed at all.How can I get suggestions case-insensitive?
I have configured in schema.xml fields like this:
<fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
minShingleSize="2"
maxShingleSize="4"
outputUnigrams="true"
outputUnigramsIfNoShingles="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
what worked for me was tweaking a code from velocity file, head.vm, I changed 'terms.prefix': function() { return $("#q").val().toLowerCase();},
which solved my issue as I am using terms component for suggestions.
I tried the same schema in the Solr Admin Analysis view. You can provide the index and query value here to see how the tokens are matched.
For your schema, I tried it in my local solr instance, it seems to work fine. ie., the Game and game are considered equal and matched.
I would urge you to post the request query, and/or provide the Suggester configurations (if you are using the same).

Resources