Multi word synonyms in azure search not working - azure-cognitive-search

Currently when the user searches "united kingdom", they are also getting results about 'united arab emirates'. I am looking to define a synonym map that would help me show results only about 'United kingdom'
Have tried defining the following synonym map
'united kingdom, uk\n'
'united kingdom, uk => tag001\n' - Tried the elastic search way
'united kingdom => uk\n'
In all the cases I am still getting matches with 'united arab emirates'
Appreciate any help on how to structure the synonym map properly?

Unfortunately, multi-word synonyms are not supported, and this is a known but undocumented limitation. A known workaround is to create a custom analyzer with a SynonymTokenFilter. This will produce the required synonym tokens in the index. Unfortunately, you must rebuild the index to update synonyms.
Synonyms you update on runtime without rebuilding the index only support single-term synonyms.
For a detailed description of the workaround, see my recent question Is there a way to make multi term synonyms work in Azure Cognitive Search

Related

Azure Search synonyms not reflecting in results

The synonyms don't seem to function in Azure Search
I updated my synonyms map with the following payload
{
"name" : "synonymmap1",
"format" : "solr",
"synonyms" :
"Bob, Bobby,Bobby\n
Bill, William, Billy\n
Harold, Harry\n
Elizabeth, Beth\n
Michael,Mike\n
Robert, Rob\n"
}
Then when I examined the synonymMap, I see this
{
"#odata.context":
"https://athenasearchdev.search.windows.net/$metadata#synonymmaps",
"value": [
{
"#odata.etag": "\"0x8D4E7F3C1A9404D\"",
"name": "synonymmap1",
"format": "solr",
"synonyms": "Bob, Bobby,Bobby\n\r\n Bill, William, Billy\n\r\n Harold, Harry\n\r\n Elizabeth, Beth,Liza, Elize\n\r\n Michael,Mike\n\r\n Robert, Rob\n\r\n"
}
]
}
However, the synonyms don't seem to function. e.g results for a search on Mike and Michael are not identical?
I understand this is a preview feature, but wanted help on the following
a) once defined as synonyms, should we not expect exact same results and search scores across all synonym variations
b) Can these synonyms apply at a column level (e. first name alone and not address)- or is it always across the document
c) if we have a large set of synonyms (over 1000)- does it lead to performance impact?
I am Nate from Azure Search. To answer the questions first :
a) Yes, you should. If "Bill" and "Williams" were defined as synonyms. Searching on either should yield the same result.
b) It's always at the column level. You use the field/column property called 'synonymMaps' to specify which synonym maps to use. Please see "Setting the synonym map in the index definition" in https://azure.microsoft.com/en-us/blog/azure-search-synonyms-public-preview/ for more information.
c) Do you mean over 1000 synonyms for a word? or 1000 synonym rule in the synonym map? The former definitely impacts performance because the search query will expand to 1000 of terms. In fact, you can't define more than 50 synonyms in a rule. The latter, 1000s of rules in a synonym map shouldn't impact performance unless the rules are constantly updated.
Regarding your comments that synonyms don't function, based on your questions, I was wondering if the synonyms feature was enabled in the index definition. Could you check that and if it doesn't function, feel free to drop me an email at nateko#microsoft.com.
The extraneous new line characters you see in the retrieved synonym map may have been inserted by the http client you were using at the time of uploading. Some http clients, fiddler and postman for example, insert new line character at the line ending automatically so you don't have to do it yourself.
Thanks,
Nate

Azure search doesn't distinguish between singular and plural

I'm using Azure search and in my indexed database table I have a row with the text 'Government Grants'.
When i search 'Grant' it returns no results, If I search 'Grants' it returns results
I have the same issue with 'Sales' and 'Sale'
How can I configure azure search so that it matches singular and plural words?
Please check that corresponding field in your search index is set to be searchable, and that a natural language analyzer (such as "en.lucene" or "en.microsoft") is selected as the analyzer for that field. The default analyzer, "standard", doesn't handle plural forms or any other word inflections, because it doesn't do any linguistic processing.
HTH,
Eugene

How to exclude results for certain words like "West Virgina" when searching for "Virginia" in a US state list?

I've got SOLR happily running indexing a list of department names that contain US states. It is working well however, searching for "Virginia" will turn up results containing "West Virginia", and while certainly helpful for some business requirements, is not in ours.
Is there a special way of saying that a query for X must not contain Y (I don't mind crafting a special query for the case of "Virginia"), or can I only do this post-query by iterating over the results and excluding results with "West Virginia"?
Use a minus sign (hyphen) combined with the phrases/terms you want to exclude. If you use the dismax query parser, then you don't even need to specify field names.
Examples:
using dismax:
q=virginia -"west virginia"
using standard query parser:
q=field_name:(virginia -"west virginia")
Refer to the Solr Query Syntax wiki page and its further links for more examples.
You could make a state field that is a string type and just search on state:"virginia" (lowercase the string before indexing / searching)

GAE Full Text Search API phrase matching

I can only find exact phrase matching for queries in the experimental Search API for Google App Engine. For example the query 'best prices hotel' will only match that exact phrase. It will not match texts such as 'best hotel prices' or 'best price hotels'. It's of course a much more difficult task to match text in a general way but I thought the Search API would at least be able to handle some of that.
Another example is the query 'new cars' which will not match the text 'new and used cars'.
You should be able to use the '~' operator to rewrite queries to include plurals.
E.g., ~hotel or ~"best prices hotel".
Documentation about this operator should be added in the next app engine SDK release.

How can I find a city and country based on a user search?

I am trying to search a SQL Server 2008 table (containing about 7 million records) for cites and countries based on a user input type text. The search string that I get from the user can be anything like:
"Hotels in San Francisco, US" or "New York, NY" or "Paris sddgdfgxx" or "Toronto Canada" terms are not allways separated by comma and not in a specific order and there might be unusefull data.
This is what I tried:
Method 1: FTS with contains:
ex: select * from cityNames where contains(cityname,'word1 and word2') -- with AND
select * from cityNames where contains(cityname,'word1 or word2') -- with OR
This didn't work very well because a term like 'sddgdfgxx' would return nothing if used with 'AND'. Using OR will work for one word cities like 'Paris' but not for 'San Diego' or 'San Francisco'
Method 2: this is actually a reverse search, the logic of it is to search if the user imput string contains any of the cities or countries from my table. This way I'll know for sure that 'Aix en Provence' or 'New York' was searched for.
ex: select * from cityCountryNames where 'Ontario, Canada, Toronto' like cityCountryNames
notes: I wasn't able to get results for two words cities and the query was slow.
Any help is appreciated.
I would strongly recommend using a 3rd-party API like the Google Geocoding API to take such input and parse it into a location with discrete parts (street address, city, state, country, etc.) Then you could use those discrete parts to search your database if necessary.
Map services like Google and Bing have solved this problem way better than you or I ever would, so why not leverage all the work they've done?
SQL isn't designed for the kinds of queries you are performing, certainly not scale.
My recommendation would be as follows:
Index all your places (cities + countries) into a Solr Index. Solr is a FOSS search server built using Lucene and can easily query the 7MM records index in milliseconds or less.
Query solr with the user typed string and voila the first match is the best match.
So even if the user typed "Paris sddgdfgxx", Paris should be your first hit. If you want to get really sophisticated use an n-gram approach (known as Lucene Shingles)
Since Solr offers a RESTful (HTTP) API should easily integrate into whatever platform you are on.

Resources