Solr substring search yields all indexed results

Solr substring search yields all indexed results - solr

To do a substring search, I have added a new fieldType - "Text" with NgramFilter.
It works fine perfectly but downside is this problem
Example
name = ['Apple','Samy','And','a']
When I do a search name:a, then all the above items gets pulled up. Even when search changes to "App". All the above items are pulled. How can I fix this issue?
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" />
</analyzer>
</fieldType>

As you can see in the analysis, both the indexed value and the query value gets parsed through the EdgeNGramFilter - meaning that it will match anything that is a substring of anything else. Add a simpler filter for querying the field, and you should be good to go.
The example from the Wiki should be usable by just copying and pasting it:
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
My initial guess was that since you don't provide two alternative definitions, Solr will use the same chain for both. Your analysis output confirms that suspicion. Try adding a analyser with type="query" to have a specific chain for querying the field (you do not want EdgeNGram both places).

Related

No response with query string containing whitespace in SOLR autocomplete

I am trying to use SOLR autocomplete feature, Basically once a user types 3 characters, I want to show response with every character typed. SOLR version is 6.5.1. Below is the configuration I am using.
<fieldType name="searchFieldType" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have a sample index which is having field values as below.
"ta", "taj", "tajbacd", "tajabcd", "taj cbad","taj abcd", "taj bcad","taj abcd cbad", "taj abcd abcd","taj abcd bacd", "abcd taj","abcd ta", "random string"
When I am seraching for "taj", I am getting expected results But if I search for "taj ", or "taj ab", Solr is not returning any results. Can you guys help me here. I tried to use Analysis, which is showing ngram is found, below is the screenshot of the same.

So, I read your question too fast...my bad.
Can you show us the requests you are using to veirfy this? Both the one working and the one not working.
By the way, one thing you can already fix, if you send only 3 chars or more, you can change your minGramSize="1" to minGramSize="3."

Well you can just easily use wildcard/partial match in this case
q={!complexphrase inOrder=true}YourField:"taj ab*"

Solr 6.6.0 Case Insensitive Query Not Working

Solr 6.6.0 Case Insensitive Query Not Working.
I had tried all other option/answer available on internet.
I had tired with,
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
but its not working.
I had tired with,
<filter class="solr.LowerCaseFilterFactory"/>
but its not working.
I had tired many different way, but none working.
i.e I want same result searching with title_s:iPhone and title_s:iphone.
I am not sure what would cause problem.

If case insensitive search was not working in a Solr release, you would get much more noise than just one stack overflow question.
Let's use this question to illustrate the approach everyone should follow for basic Solr usage :
1) Refer to the documentation - Solr has a good free online documentation.
Specifically describing how to configure the schema.xml and the various aspects of it [1].
From there you can learn that is quite simple to configure a field to be case insensitive :
<field name="title" type="text_case_insensitive" indexed="true" stored="true"/>
<fieldType name="text_case_insensitive" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
N.B. if you had a previous configuration in the schema for the title field, you need to re-index
[1]https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html

I had tried in many different way, but none work.
Than I had implement as below and it work fine.
Let me know below method is correct or not, but works fine for me :)
I had remove below code from schema,
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>
And added (replace) below code,
<fieldType name="string" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Solr suggest exact match

I am trying to make solr return exact match on suggestion, ex:
spellcheck.q=tota does return total in results but
spellcheck.q=total does not return total in results.
I am using this field for suggestions:
<fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Any idea how to make Solr returns exact matches on suggest??

You are using the SpellChecker component, which, as the name indicate, is meant for spellchecking. It returns suggestions for how entry the should be spelled. When the word is spelled correct (which equals a exact match) it returns nothing, which is the reason you dont see the word in the list.
Since Solr 4.7 a new Suggestion component has been added, which is actually implemented for autosuggestion and yields the results you expect.

can you try with this
<fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="front"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50" side="back"/>
</analyzer>
</fieldType>

As mentionned in this wiki page: https://cwiki.apache.org/confluence/display/solr/Suggester
To be used as the basis for a suggestion, the field must be stored.
Make sure your field is stored.
You field isn't stored so it is returning the data crunched by your indexer.

Your problem came because you used the old suggest component based on the spellcheck component (I suppose you used a version of solr before 5).
With the old spellcheck/suggest, if the word match it is not return in the response!
Test with the solr.suggestComponent (if present in your version).
see: https://cwiki.apache.org/confluence/display/solr/Suggester

Solr difficulty getting fuzzy search with multiple terms

Suppose someone's name is Alessia Keeling. I'm having difficulty getting the following queries to work
q=Alessia Keeling returns a result
q=Alessia returns a result
q=Alessia Keel returns a result
however,
q=Alessia Keeli and q=Alessia Keelin returns no results
I've tried quite a few things here in my schema.xml file, but I don't have much METHOD to my MADNESS.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
</fieldType>
Solr Admin Analyzer shows that it will match both "Alessia" and various forms of "Keeling", but Sunspot is still returning no results.
Edit 1
Here is console testing
(byebug) Sunspot.commit
(byebug) Sunspot.index
(byebug) User.search {|q| q.fulltext "Alessia Keeling" }.hits
[#<Sunspot::Search::Hit:User 4>]
(byebug) User.search {|q| q.fulltext "Alessia Keelin" }.hits
[]
Edit 2
I was finally able to get somewhere. I looked in some of my log files and noticed that the call my app was making to solr was using the query string
"http://localhost:8981/solr/select?fq=type%3AUser&q=Eli+Donnelly+I&fl=%2A+score&qf=email+first_name_text+last_name_text+username_text+name_text+description_text&defType=dismax&start=0&rows=30&debugQuery=true
This printed out some useful information, most useful being "parsedQuery" I was able to see that another field was conflicting. I have another field that handles emails and in this latter case where my query string was "Eli Donnely I", the sole letter token "I" was breaking the query because of the email field. Adding a length filter fixed it.

I've tried this from the Solr side with an example core, and the match is returned with the NGram filter while indexing. You might want to check the server side logs to see that you're actually reindexing, at least.
The field definition is as follows:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
data.json:
[{"id": 1, "text": "Alessia Keeling"}, {"id": 2, "text": "Alessia Fubar"}]
Populate it with data:
curl http://localhost:8983/solr/collection1/update\?commit\=true --data-binary #data.json -H 'Content-type:application/json'
Searching:
GET http://localhost:8983/solr/collection1/select\?q\=alessia%20keelin\&q.op\=AND
[..]
<result name="response" numFound="1" start="0"><doc><int name="id">1</int><str name="text">Alessia Keeling</str><long name="_version_">1473248002863792128</long></doc> </result>
.. which returns the promised document, while keeping the non-matching document out of the result.

As mentioned in the comment, you need to switch the EdgeNGramFilterFactory to the index instead of the query.

For me, this has apparently worked. Edit the file schema.xml as below:
<?xml version="1.0" encoding="UTF-8"?>
<schema name="sunspot" version="1.0">
... (other stuff)
<solrQueryParser defaultOperator="AND|OR"/>
... (other stuff)
</schema>
Before, I have the defaultOperator only configured as AND, after I changed it, searches were getting more flexible.
Also, I'd suggest giving a look at this page.

Autocomplete in Solr with Case-insensitive feature

I have been trying out this autocomplete feature in Solr4.7.1 using Suggester.I have configured it to display phrase suggestions also.Problem is If I type "game" I get suggestions as "game" or phrases containing "game".
But If I type "Game" no suggestion is displayed at all.How can I get suggestions case-insensitive?
I have configured in schema.xml fields like this:
<fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
minShingleSize="2"
maxShingleSize="4"
outputUnigrams="true"
outputUnigramsIfNoShingles="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

what worked for me was tweaking a code from velocity file, head.vm, I changed 'terms.prefix': function() { return $("#q").val().toLowerCase();},
which solved my issue as I am using terms component for suggestions.

I tried the same schema in the Solr Admin Analysis view. You can provide the index and query value here to see how the tokens are matched.
For your schema, I tried it in my local solr instance, it seems to work fine. ie., the Game and game are considered equal and matched.
I would urge you to post the request query, and/or provide the Suggester configurations (if you are using the same).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr substring search yields all indexed results - solr

Related

No response with query string containing whitespace in SOLR autocomplete

Solr 6.6.0 Case Insensitive Query Not Working

Solr suggest exact match

Solr difficulty getting fuzzy search with multiple terms

Autocomplete in Solr with Case-insensitive feature

Categories

Resources