Solr difficulty getting fuzzy search with multiple terms

Solr difficulty getting fuzzy search with multiple terms - solr

Suppose someone's name is Alessia Keeling. I'm having difficulty getting the following queries to work
q=Alessia Keeling returns a result
q=Alessia returns a result
q=Alessia Keel returns a result
however,
q=Alessia Keeli and q=Alessia Keelin returns no results
I've tried quite a few things here in my schema.xml file, but I don't have much METHOD to my MADNESS.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
</fieldType>
Solr Admin Analyzer shows that it will match both "Alessia" and various forms of "Keeling", but Sunspot is still returning no results.
Edit 1
Here is console testing
(byebug) Sunspot.commit
(byebug) Sunspot.index
(byebug) User.search {|q| q.fulltext "Alessia Keeling" }.hits
[#<Sunspot::Search::Hit:User 4>]
(byebug) User.search {|q| q.fulltext "Alessia Keelin" }.hits
[]
Edit 2
I was finally able to get somewhere. I looked in some of my log files and noticed that the call my app was making to solr was using the query string
"http://localhost:8981/solr/select?fq=type%3AUser&q=Eli+Donnelly+I&fl=%2A+score&qf=email+first_name_text+last_name_text+username_text+name_text+description_text&defType=dismax&start=0&rows=30&debugQuery=true
This printed out some useful information, most useful being "parsedQuery" I was able to see that another field was conflicting. I have another field that handles emails and in this latter case where my query string was "Eli Donnely I", the sole letter token "I" was breaking the query because of the email field. Adding a length filter fixed it.

I've tried this from the Solr side with an example core, and the match is returned with the NGram filter while indexing. You might want to check the server side logs to see that you're actually reindexing, at least.
The field definition is as follows:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="20" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
data.json:
[{"id": 1, "text": "Alessia Keeling"}, {"id": 2, "text": "Alessia Fubar"}]
Populate it with data:
curl http://localhost:8983/solr/collection1/update\?commit\=true --data-binary #data.json -H 'Content-type:application/json'
Searching:
GET http://localhost:8983/solr/collection1/select\?q\=alessia%20keelin\&q.op\=AND
[..]
<result name="response" numFound="1" start="0"><doc><int name="id">1</int><str name="text">Alessia Keeling</str><long name="_version_">1473248002863792128</long></doc> </result>
.. which returns the promised document, while keeping the non-matching document out of the result.

As mentioned in the comment, you need to switch the EdgeNGramFilterFactory to the index instead of the query.

For me, this has apparently worked. Edit the file schema.xml as below:
<?xml version="1.0" encoding="UTF-8"?>
<schema name="sunspot" version="1.0">
... (other stuff)
<solrQueryParser defaultOperator="AND|OR"/>
... (other stuff)
</schema>
Before, I have the defaultOperator only configured as AND, after I changed it, searches were getting more flexible.
Also, I'd suggest giving a look at this page.

Related

Solr not returning the exact element

Using Solr 7.7.3
I have an element with the label:"alpha-ravi"
and when I search in solr label:"alpha" its returning the element with the label "alpha-ravi"
when looking at the solr doc, it should not return this element.
can anyone explain why this behavior ?

If you want to retrieve the exact results (i.e return docs with "alpha-ravi" only if the user types the exact "alpha-ravi" in the search), then I would suggest you could go with the Keyword tokenizer (solr.KeywordTokenizerFactory). This tokenizer would treat the entire "alpha-ravi" as a single token and thus, will not return partial results if there's a match for "alpha" or "ravi".
For example: in your schema.xml file you should add something like (configure the various filter chains as per your need)
<fieldType name="single_token_string" class="solr.TextField" sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
And then you can use this fieldType in the same schema.xml (referencing the KeywordTokenizer we just defined)
<field name="myField" type="single_token_string" indexed="true" stored="true" />
By default, Solr uses the StandardTokenizer and thus, splits "alpha-ravi" on that hyphen into multiple tokens (thus, matching "alpha" and "ravi").
Also, as an alternative you could run a query with a phrase as well (which will not be tokenized on spaces/delimiters). Possibly something likehttp:localhost:8983/solr/...fq=label:"alpha-ravi"
Hope that helps. All the best!

No response with query string containing whitespace in SOLR autocomplete

I am trying to use SOLR autocomplete feature, Basically once a user types 3 characters, I want to show response with every character typed. SOLR version is 6.5.1. Below is the configuration I am using.
<fieldType name="searchFieldType" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have a sample index which is having field values as below.
"ta", "taj", "tajbacd", "tajabcd", "taj cbad","taj abcd", "taj bcad","taj abcd cbad", "taj abcd abcd","taj abcd bacd", "abcd taj","abcd ta", "random string"
When I am seraching for "taj", I am getting expected results But if I search for "taj ", or "taj ab", Solr is not returning any results. Can you guys help me here. I tried to use Analysis, which is showing ngram is found, below is the screenshot of the same.

So, I read your question too fast...my bad.
Can you show us the requests you are using to veirfy this? Both the one working and the one not working.
By the way, one thing you can already fix, if you send only 3 chars or more, you can change your minGramSize="1" to minGramSize="3."

Well you can just easily use wildcard/partial match in this case
q={!complexphrase inOrder=true}YourField:"taj ab*"

Solr 6.6.0 Case Insensitive Query Not Working

Solr 6.6.0 Case Insensitive Query Not Working.
I had tried all other option/answer available on internet.
I had tired with,
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
but its not working.
I had tired with,
<filter class="solr.LowerCaseFilterFactory"/>
but its not working.
I had tired many different way, but none working.
i.e I want same result searching with title_s:iPhone and title_s:iphone.
I am not sure what would cause problem.

If case insensitive search was not working in a Solr release, you would get much more noise than just one stack overflow question.
Let's use this question to illustrate the approach everyone should follow for basic Solr usage :
1) Refer to the documentation - Solr has a good free online documentation.
Specifically describing how to configure the schema.xml and the various aspects of it [1].
From there you can learn that is quite simple to configure a field to be case insensitive :
<field name="title" type="text_case_insensitive" indexed="true" stored="true"/>
<fieldType name="text_case_insensitive" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
N.B. if you had a previous configuration in the schema for the title field, you need to re-index
[1]https://lucene.apache.org/solr/guide/6_6/field-type-definitions-and-properties.html

I had tried in many different way, but none work.
Than I had implement as below and it work fine.
Let me know below method is correct or not, but works fine for me :)
I had remove below code from schema,
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>
And added (replace) below code,
<fieldType name="string" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Solr substring search yields all indexed results

To do a substring search, I have added a new fieldType - "Text" with NgramFilter.
It works fine perfectly but downside is this problem
Example
name = ['Apple','Samy','And','a']
When I do a search name:a, then all the above items gets pulled up. Even when search changes to "App". All the above items are pulled. How can I fix this issue?
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" />
</analyzer>
</fieldType>

As you can see in the analysis, both the indexed value and the query value gets parsed through the EdgeNGramFilter - meaning that it will match anything that is a substring of anything else. Add a simpler filter for querying the field, and you should be good to go.
The example from the Wiki should be usable by just copying and pasting it:
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
My initial guess was that since you don't provide two alternative definitions, Solr will use the same chain for both. Your analysis output confirms that suspicion. Try adding a analyser with type="query" to have a specific chain for querying the field (you do not want EdgeNGram both places).

partial word search in solr example: sarvesh , i want search like rves

examples:Beautiful
search based: auti...
I would like to search with only part of a word, not the whole word.
For example when I search auti only the middle 3 letters ,not the whole word.I am not getting results : For the moment I am using the search api with apache solr (and perhaps views).
Any suggestions please?
I am using this one
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10"/>
</analyzer>
</fieldType>

You can use wildcard query.
In your example above, you should prepend and append your search terms with an asterix, so if someone searches for auti, the query you send to server will be auti
This should pull all results with all words that contain the word auti within them.
http://www.solrtutorial.com/solr-query-syntax.html

Now since you wanna search for sub-strings inside words, you can add side="back" to your definition, and that should help you achieve your goal.
So your fieldtype definition will look like this:
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10" side="front" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10" side="back" />
</analyzer>
</fieldType>

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr difficulty getting fuzzy search with multiple terms - solr

As mentioned in the comment, you need to switch the EdgeNGramFilterFactory to the index instead of the query.

Related

Solr not returning the exact element

No response with query string containing whitespace in SOLR autocomplete

Solr 6.6.0 Case Insensitive Query Not Working

Solr substring search yields all indexed results

partial word search in solr example: sarvesh , i want search like rves

Categories

Resources