Azure search term matching issue - azure-cognitive-search

I'm using the English.Lucene analyzer, searching for the term 'rocking chairs' correctly returns results such as
rocking chair
but does not return results such as
rocker
What can I do to tune the index so that it returns these results?

As #Jdresc, mentioned synonym maps are a great option, however, I also recommend taking a look at the /analyze api to see how this analyzer (and other such as Microsoft English analyzer) tokenizes this word to see if it may be more appropriate for you. https://learn.microsoft.com/en-us/rest/api/searchservice/test-analyzer

You can use a Synonym map https://learn.microsoft.com/en-us/azure/search/search-synonyms and map rocking to rocker so that results with rocker are also returned.

Related

Azure search: Wild card queries does not work with japanese/chinese characters

I used icu_tokenizer using custom analyzer to create a search index for Japanese words. Index was created successfully. Using icu_tokenizer as for asian languages it works better than the default azure search tokenizer.
Now when I use query for string Ex:- 赤城 I see multiple search results (total 131) from the index. But when I use the wild card search with the same word, Ex: 赤城* (adding * at the end of the word) or /赤城.*/ (using regex search query) i see 0 search results. The weird part is that * seems to work with single japanese character 赤* gives me same number of search results as 赤 gives. But as soon as I increase the number of japanese characters from 1, wild card queries with * stops working and returns 0 search result. All of these queries I am testing it on search explorer on Azure portal using querytype=full (lucene syntax query)
In my application search terms are normally used as prefix search so normally we append * at the end of the search string to fetch search results but looks like these lucene wildcard queries with japanse characters just do not work. Any idea, how can I make these prefix queries (using wildcard * at end of search strings) work when search strings are given in japanese characters?
Any quick help will be much appreciated!!
I tested with my installation now and I can confirm that wildcards only work with Japanese content when you use a Japanese analyzer.
In my example I set up one index using a property Body that does not have a specific analyzer defined. Then I set up another index where Body uses the ja.microsoft language analyzer. The content in both indexes are identical. I then tried to search for 自動車 (automobile) with a trailing wildcard.
自動車* returns multiple hits from my index using the japanese analyzer. No hits are returned from the index without a specific analyzer defined.
sorry for the late reply.
Have you tried using one of the Japanese language analyzers? For example, ja.microsoft
Also, if you want to use prefix search, you can try experimenting with the suggester feature which is designed to be efficient for this scenario.

Using solr 4.2 how do I use/enable fuzzy phrase searching

So right now I'm just using the admin interface to run search queries. I know that a tilde ~ suffix causes a word to become fuzzy search.
However, what about a phrase? I tried "some words"~ but it doesn't seem to be returning results when it should be. Any idea why? Do I need a special fieldtype or special filters?
Right now, everything is pretty vanilla but I did import a lot of data. (About 12 million rows). I know that there are things in there that should be getting returned with a good fuzzy match that are not.
Any help is appreciated.
Also, if it makes a difference I would like to use the levenshtein algorithm.
ComplexPhraseQueryParser can be used to handle wildcard and fuzzy phrase queries.

Terms Prevalence in SolR searches

Is there a way to specify a set of terms that are more important when performing a search?
For example, in the following question:
"This morning my printer ran out of paper"
Terms such as "printer" or "paper" are far more important than the rest, and I don't know if there is a way to list these terms to indicate that, in the global knowledge, they'd have more weight than the rest of words.
For specific documents you can use QueryElevationComponent, which uses special XML file in which you place your specific terms for which you want specific doc ids.
Not exactly what you need, I know.
And regarding your comment about users not caring what's underneath, you control the final query. Or, in the worst case, you can modify it after you receive it at Solr server side.
Similar: Lucene term boosting with sunspot-rails
When you build the query you can define what are the values and how much these fields have weight on the search.
This can be done in many ways:
Setting the boost
The boost can be set by using "^ "
Using plus operator
If you define + operator in your query, if there is a exact result for that filed value it is shown in the result.
For a better understanding of solr, it is best to get familiar with lucene query syntax. Refer to this link to get more info.

When enabled stemming, searching for the root word gives no hits

I have indexed a site with solr. It works very well if stemming is not enabled. Using stemming, however, solr does not return any hits when searching for the root of a word. I use Swedish stemming.
For example, searching for support gives hits if not using stemming. Using stemming, searching for support gives no hits. Though, searching for supporten returns hits that match support.
By debugging the query, I can see that it stems the word support to suppor (which is incorrect by the way, but that should not matter). However, having the word stemmed to suppor, I want it to search for matches with the the original query word as well.
I'd appreciate any help on this!
Afaik, there is no way to keep the original word when stemming...
I assume that you are using solr.SnowballPorterFilterFactory. Snowball algorithm is too aggressive.
You should try a Hunspell stemmer or maybe solr.SwedishLightStemFilterFactory.
A workaround you can do is to reformat your query into "support support*" or "support support~". * is wildcard matching and ~ is fuzzy matching using Lucene syntax. I know you didn't mention the need to do wildcard and fuzzy search, but I found under these circumstances, the stemming on query will not take effect, so "support" is preserved. And stemming will still be effective on the first word, so both results will be returned if any. Plus, fuzzy search will help reduce the tolerance of typos in users' queries, so it's an added benefit.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources