How To Select Analyzer (or language) in Azure Search Suggester - azure-cognitive-search

I have some data in an Azure suggester with language specific characters í,ó,ú, etc. Unless I search with those characters I don't get any results back. This would get solved if I was able to add an analyzer to the suggester (like lucene indexes have)
"suggesters": [
{
"name": "suggester",
"searchMode": "analyzingInfixMatching",
"sourceFields": [
"Name"
]
}

Suggesters are not supported on fields that use custom analyzers. In some scenarios it makes sense to create another field analyzed with the standard analyzer (or a language analyzer) and use it for suggestions only.
We realize this is a limitation, please vote for this feature to help us prioritize.

Related

Solr syntax for phrase query

I have a field with definition:
"replace-field": {
"name":"search_words",
"type":"lowercase",
"stored":true,
"indexed": true,
"multiValued": true
}
that contains sentences as array (thus multiValued: true):
"id":500
"search_words":["How much oil should you pour into the engine",
"How important is engine oil?]
How should I create a query thatwould return that document (with id = 500) when user inputs phrase "engine oil"?
With single term queries I can user *engine* and it would find that document becasue engine is in the middle of the sentence but I can't find a way to be able to seearch for phrases in sentences. Is it even possible using solr?
Solr supports phrase search, and is what it was actually designed for. Wildcard searches are not really the way you should use Solr by default - the field type should tell Solr how to process the text in the field to make you get hits when querying it in a regular way.
In this case the standard text_en would probably work fine, or a field definition with a Standard Tokenizer and a lowercasefilter (and possibly a WordDelimiterGraphFilter to get rid of special characters).
The query would then be search_words:"engine oil".

Azure Search synonyms not reflecting in results

The synonyms don't seem to function in Azure Search
I updated my synonyms map with the following payload
{
"name" : "synonymmap1",
"format" : "solr",
"synonyms" :
"Bob, Bobby,Bobby\n
Bill, William, Billy\n
Harold, Harry\n
Elizabeth, Beth\n
Michael,Mike\n
Robert, Rob\n"
}
Then when I examined the synonymMap, I see this
{
"#odata.context":
"https://athenasearchdev.search.windows.net/$metadata#synonymmaps",
"value": [
{
"#odata.etag": "\"0x8D4E7F3C1A9404D\"",
"name": "synonymmap1",
"format": "solr",
"synonyms": "Bob, Bobby,Bobby\n\r\n Bill, William, Billy\n\r\n Harold, Harry\n\r\n Elizabeth, Beth,Liza, Elize\n\r\n Michael,Mike\n\r\n Robert, Rob\n\r\n"
}
]
}
However, the synonyms don't seem to function. e.g results for a search on Mike and Michael are not identical?
I understand this is a preview feature, but wanted help on the following
a) once defined as synonyms, should we not expect exact same results and search scores across all synonym variations
b) Can these synonyms apply at a column level (e. first name alone and not address)- or is it always across the document
c) if we have a large set of synonyms (over 1000)- does it lead to performance impact?
I am Nate from Azure Search. To answer the questions first :
a) Yes, you should. If "Bill" and "Williams" were defined as synonyms. Searching on either should yield the same result.
b) It's always at the column level. You use the field/column property called 'synonymMaps' to specify which synonym maps to use. Please see "Setting the synonym map in the index definition" in https://azure.microsoft.com/en-us/blog/azure-search-synonyms-public-preview/ for more information.
c) Do you mean over 1000 synonyms for a word? or 1000 synonym rule in the synonym map? The former definitely impacts performance because the search query will expand to 1000 of terms. In fact, you can't define more than 50 synonyms in a rule. The latter, 1000s of rules in a synonym map shouldn't impact performance unless the rules are constantly updated.
Regarding your comments that synonyms don't function, based on your questions, I was wondering if the synonyms feature was enabled in the index definition. Could you check that and if it doesn't function, feel free to drop me an email at nateko#microsoft.com.
The extraneous new line characters you see in the retrieved synonym map may have been inserted by the http client you were using at the time of uploading. Some http clients, fiddler and postman for example, insert new line character at the line ending automatically so you don't have to do it yourself.
Thanks,
Nate

How can I tune the Retrieve and Rank ranker with a dictionary/model of domain specific phrases?

We are trying to group phrases together in order to improve results.
For instance, if the user asks a question like "When do I have to change the filter of my air conditioning?" with a domain specific phrase such as “air conditioning”, R&R returns some answers containing the term “air” and no “conditioning” or it returns answers containing other terms like air bag or air filter.
This can be accomplish using a raw Solr instance and set the phrase between quotes. So, the Solr query would look like the following:
...
"debug": {
"rawquerystring": "When do I have to change the filter of my \"air conditioning\" ?",
"querystring": "When do I have to change the filter of my \"air conditioning\" ?",
"parsedquery": "text:when text:do text:i text:have text:to text:change text:the text:filter text:of text:my PhraseQuery(text:\"air conditioning\") text:?",
"parsedquery_toString": "text:when text:do text:i text:have text:to text:change text:the text:filter text:of text:my text:\"air conditioning\" text:?",
...
However, the R&R guide states:
The syntax is different from standard Solr syntax as follows:
You can search for a single term, or a phrase. You do not need to
surround the phrase with double quotation marks as with Solr, but you
can include phrases in the query and they are accounted for by the
ranker models.
We could not find more details regarding the above statement.
But, as we understand, the ranker is supposed to identify phrases. If that is the case, we were wondering if there is a way where we can set a dictionary of phrases in order to tune the ranker?
Or, could we set our own model of legal phrases? What are the options to accomplish this goal?
Thanks
Currently RnR doesn't support strict phrase querying, though there are features that will take term ordering and adjacent terms into consideration. We are working on a new version of service, in which users would be able to use full regular solr query syntax (including specifying phrases) for document retrieving.

Solr Query with LIKE Clause

I'm working with Solr and I'd like to know if it is possible to have a LIKE clause in the query. For example, I want to know all organizations with "New York" in the title. In SQL, this would be written like Name LIKE 'New York%'.
My question - how do you write a LIKE query in Solr?
I'm using the SolrNet library, if that makes a difference.
You just search for "New York", but first you need to properly configure your field's analyzer. For example you might want to start with a field type like text_general as defined in the default Solr schema. This field type will tokenize on whitespace and other common word separators, then apply a filter of stopwords, then lowercase the terms in order to make searches case-insensitive.
More information about analyzers in the Solr wiki.
If you're using solr 3.1 or newer, have a look at the Extended DisMax Query Parser, which supports wildcard queries. You can enable it using <str name="defType">edismax</str> in the request handler configuration.
Then you can use a query like title:New York* with the same behaviour as a query with like clause. The main difference between my answer and the accepted one is that you can even search for fragment of words using wildcards. For example New Yorkers would match in this case.
Unfortunately you could have problems with case-sensitive queries even if you're using a LowerCaseFilterFactory. Have a look here to know more. Most of those problems will be fixed with the solr 3.6 release since the SOLR-2438 issue has been solved.

Return stemmed word in Solr

We have stemming in our Solr search and we need to retrieve the word/phrase after stemming. That is if I search for "oranges", through stemming a search for "orange" is carried out. If I turn on debugQuery I would be able to see this, however we'd like to access it through the result if possible. Basically, we need this, because we pass the searched word as a parameter to a 3rd party application which highlights the word in an online PDF reader. Currently, if a user searches for "oranges" and a document contains "orange", then the PDF wouldn't highlight anything since it tries to highlight "oranges" not "orange".
Thanks all in advance,
Krt_Malta
I've no experience with Solr but if you need it just for presentation to users you could stem their queries using the same stemmer Solr uses yourself. This would probably be faster since it would avoid a trip to Solr's index. For English this would presumably be http://tartarus.org/~martin/PorterStemmer/ - or you could check Solr's implementation.
However, a word of caution, most stemming algorithms do not guarantee that stemmed words will be actual words. Check here http://snowball.tartarus.org/algorithms/english/stemmer.html for examples.
You could use the implicit analysis request handler to get the stemmed word.
For your example, if you are using the text_en field and the Snowball Stemmer, the URL
<YOUR SOLR HOST>/solr/<YOUR COLLECTION>/analysis/field?analysis.query=oranges&analysis.fieldtype=text_en&verbose_output=1
would give you a json response, including the following:
"org.apache.lucene.analysis.snowball.SnowballFilter",
[
{
"text": "orang",
...

Resources