Azure Search: boost results that contains word - azure-cognitive-search

I have an airports database in Azure Search which upon searching I would like to boost results with those airports that contains the word "international" in the airport name.
given 2 results that have the same score, i would like to boost the one that has the word "international" in the airport name using just Azure Search (i.e. if possible, not using any code to manipulate after getting the relevant results).
I tried Term Boosting but it returns me a list of airports that has "international" in them which is not what I want.
I looked at the Scoring Functions but none of them seems to suit my needs
in essence, i do not want to "match" results that contains the word "international"
but i want to "boost" results that contains the word "international" after the user keys in the query text

If you want results containing a term to score higher, but you don't want to require matching documents to contain the term, you can use OR as well as AND. For example, if the user typed "Dallas", your query could look like this:
Dallas OR (Dallas AND airportName:international)
If you further want to control the impact that the term international has on the score, you can use term boosting.
You might find this article on how Azure Search processes queries to be helpful.

Related

Solr search relevancy

i use solr and i have a trouble with result score. For example
i have such docs with one field (for example "content"):
content = car
content = cars
content = carable awesome
content = awful for carable
And i make search query with such params ":{
"mm":"1",
"q":"car",
"tie":"0.1",
"defType":"dismax",
"fl":"*, score",}
i expect to see the result like this:
car: 5 score
cars: 4.8 score
carable awesome: 3
awful for carable: 3
Word without "s" should be highter, but i have strange things. How i can boost absolute match (like a car)
This happens because the field type you're using for the field has a stemming filter (or an ngramfilter) attached (which makes cars and car generate hits against each other). You can't boost "exact hits" inside such a field, since for Lucene they are the same value. What's stored in the index is the same for both car and cars - the latter is processed down to car as well.
To implement this and get exact hits higher, you add a second field without that filter present that only tokenizes (splits) your content on whitespace and lowercases the token. That way you have a field where cars and car are stored as different tokens, and tokens won't contribute to the score if they're not being matched.
You can use qf in Solr to tell Solr which fields you want to search against, and you can give a boost at the same time - so in your case you'd have qf=exact_field^10 text_field where hits in exact_field would be valued ten times higher than hits in the regular field (the exact boost values will depend on your use case and how you want the query profile to behave).
You can also use the different boost arguments (bq and boost) to apply boosts outside of your regular query (i.e. add a query to bq that replicates your original query), but the previous suggestion will probably work just fine.

Synonym Maps in Azure Search, synonym phrases

I'm trying to use synonym maps in Azure Search and i'm running into a problem. I want to have several words and phrases map into a single search query.
In other words, when i search for either:
product 123, product0123, product 0123
i want the search to return results for a query phrase:
product123.
After reading the tutorial it all seemed pretty straight forward.
I'm using .Net Azure.Search SDK 5.0 so i've done the following:
var synonymMap = new SynonymMap
{
Name = "test-map",
Format = SynonymMapFormat.Solr,
Synonyms = "product 123, product0123, product 0123=>product123\n"
};
_searchClient.SynonymMaps.CreateOrUpdate(synonymMap);
and i use the map on one of the search fields
index.Fields.First(x => x.Name == "Title").SynonymMaps = new[] {"test-map"};
So far so good. Now if i do a search for product0123 i get results for product123 as i would expect. But if i search for a phrase product 123 or product 0123 i get bunch of irrelevant results. It's almost as if the synonym maps do not work with multi word items.
So guess my question is, am i using synonym maps incorrectly or these maps only work with single word synonyms?
Are the phrases, product 123 or product 0123, in double quotes? It is required for the phrases to be in double quotes ("product 123"). Double quotes are the operators for phrase search and in the case for synonyms, they ensure that the terms in the phrase are analyzed and matched against the rules in the synonym map as a phrase. Without it, query parser separates the unquoted phrase to individual terms and tries synonym matching on individual terms. The query becomes product OR 123 in that case.
This documentation explains how queries are parsed (stage 1) and analyzed (stage 2). The application of synonyms in done in the second stage.
To answer your second question in the comment, unfortunately double quotes are required to match multi word synonyms. However, as an application developer, you have the full control of what gets passed to the search service. For example, given a query product 123 from the user, you can re-write the query under the hood to improve precision and recall before it gets passed to the search service. Phrasing or proximity searches can be used to improve precision and wildcard (such as fuzzy or prefix searches) can be used to improve recall of the query. You would rewrite the query product 123 to something like "product 123"~10 product 123 and synonyms will apply to the phrased part of the query.
Nate

Index structure for azure search

I'm putting together a query to index medicines. A user should be able to enter their search term into a single search box. Their search term might be either a brand name for a drug, a generic name (the underlying compound on which all brands are based) or an indication and they should be returned a list of medicines that correspond to their search. I'd like to have a category facet for the type - either indication, brand or generic.
To have a category facet, my understanding is that I'd have to send my data through as one row per search term where that search term might be a brand, indication or a generic, rather than one row per brand with columns for generic list and indication. Is this correct or is there another way to get at what I'm wanting to do?
I hope I understand your ask here. From the screenshot you provided, I would assume what you would want to do is make the field "MedicineInformationType" a Facetable field in your Azure Search index and make the field "SearchTerm", "Product", "GenericList", and "ActionList" all Searchable fields in your Azure Search index (although I am not sure why you would want the "SearchTerm" field if the term in this field is already in one of the other fields).
If you structure your index this way, you can do a search for say "phosphate" and facet over the "MedicineInformationType" field to get a count of the results that are generic or brands.
For example (as a REST call):
search=phosphate&facet=MedicineInformationType

How do I create a Solr query that returns results even if one field in my query has no matches?

Suppose I want to create a recommendation system to suggest people you should connect with based off of certain attributes that I know about you and attributes I have about other people that are stored in a Solr index. Is it possible to query the index with a list of attributes (along with boosts for each attribute) and have Solr return scored results even if some of my fields return no matches? The way that I understand that Solr works is that if one of your fields doesn't contain a match in any documents found in your index, you get zero results for the entire query (even if other fields in the query matched) - is that right? What I would hope is that I could query the index and get a list of results back in order of a score given based on how many (and which) fields matched to something, even if some fields have no matches, for example:
Say that there are 2 people documents stored in the index as follows (figuratively):
Person 1:
Industry: Manufacturing
City: Oakland
Person 2:
Industry: Manufacturing
City: San Jose
And say that I perform a pseudo-Solr query that basically says "Search for everyone whose industry is equal to manufacturing and whose city is equal to Oakland". What I would like is to receive both results back in the result set, even though one of the "Persons" does not reside in Oakland. I just want that person to come back as a result with a lower score than Person1. Is this possible? What might a solr query look like to handle this? Assume that I have many more than 2 attributes for each person (so saying that I can use "And" and "Or" in my solr query isn't really feasible.. or is it?) Thanks in advance for your helpful input! (PS I'm using Solr 3.6)
You mention using the AND operator, which is likely your problem.
The default behavior of Lucene, and Solr, query syntax is exactly what you are asking for. A query like:
industry:manufacturing city:oakland
Will match either, with scoring preference on those that match both. See the lucene query syntax documentation
You can use the bq parameter (boost query) does not affect matching, but affects the scores only.
http://localhost:8983/solr/persons/select?q=industry:manufacturing&bq=City:Oakland^2
play with the boosting factor at the end to get the correct balance between matching score, and boosting score.

How can I sort appengine search index results by relevance?

I'm working on a project that uses Google App Engine's text search API to allow users to search for documents that include a words field. I'm sorting using a MatchScorer, which according to the documentation "assigns a score based on term frequency in a document".
When a user enters a query like "business promo", I convert this into a query string that looks like words:business OR words:promo. I would have expected that this would return documents that contain both the words "business" and "promo" before documents that only contain one of the words (since the documentation says it assigns a score based on term frequency in the document). However, I frequently see results that contain only one of the words before documents that contain both.
I've also tried querying using the RescoringMatchScorer, but see the same problem using this scorer.
I've thought about doing separate queries - ones that AND the search terms and ones that OR the search terms - but this would require many queries if the user enters more than two search terms. For example, if I searched for "advanced business solutions", I'd need queries like this to cover all the bases:
words:advanced AND words:business AND words:solutions
words:advanced AND words:business
words:advanced AND words:solutions
words:business AND words:solutions
words:advanced OR words:business OR words:solutions
Does anyone have any hints on how to perform searches that return more relevant results (i.e. more search term matches) before less relevant results?
Perhaps it depends on how you interpret the phrase "term frequency". I think you're interpreting it to mean "how many of my search terms appear in the document". But it could also mean "how many times (any of) the search terms appears in each document", and indeed -- at least according to some simple experiments I've done -- the latter seems to be the actual behavior.
For example, a document that contains the word "business" 20 times and never mentions the word "promo" would be scored higher than a document that contains "business" and "promo" only once each. Does that jibe with the behavior you're seeing?

Resources