Azure search contains word not working as expected - azure-cognitive-search

I am new to Azure Search. I am trying to use "contains" logic in my search query. I looked it up and found out that I need to add something like following in my search query.
&queryType=full&search=/.*_search.*/
where _search in the string I want to search. Now what happens is that the "contains" logic works fine. For example, I try to search sweep and I get well sweep-cmu in the results.
But, when I search well sweep-cmu, I get zero results. Why? and how can I improve my query to get results when I enter partial and full strings.

If you want exact match for the search query please surround the query with double quotes.
eg: "well sweep-cmu"
This will return all documents which contain the exact phrase.
Since you've just started to play with Azure Search you might find this article particularly interesting. It explains how the full text search works in Azure Search.
https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture
In order to get results for partial terms, you should use wildcard expressions in your search queries. The above article explains this in detail.
PS: Some wildcard queries can be very expensive and hence slow.

Related

How do you create Solr Queries with wildcard seaches and scoring, fuzzy search, distance searching and other features

I am trying to build a search over my domain with solr, and I am having trouble producing a keyword search that fulfils our requirements. My issue;
When my users search, the requirement is that the search must return results with partial token matches. For example:
Consider the text field: "CA-1234-ABCD California project"
The following keyword searches (what the user puts in the search field) should match this field:
``
"California"
"Cali"
"CA-1234-ABCD"
"ABCD"
"ABCD-1234"
``
etc.
With a text_en field (as configured in the example schema), the tokenization, stemming and grammar processing will allow non-wildcard searches to work for partial words/tokens in many cases, but Solr still seems limited to exact token match in many situations. For example, the following query does not match:
name:cali
The only way I have found to get the user experience that is required is to use a wildcard search:
name:*cali*
The problem with this is that tf scoring (and it seems other functionality like fuzzy searches) don't work with a wildcard search.
The question is, is there a way to get partial token matching (for all tokens not just those that have common stems/etc.) while retaining tf scoring and other advanced query functionality?
My best workaround at the moment is a query that includes both wildcard and non-wildcard clauses, such as:
name:cali OR name:*cali*
but I don't know if that is a good strategy here. Does SOLR provide a way?

Querytype=Full and searching for stop words returns no results

When using azure cognitive search, we are using full query syntax. When searching for something like: the document we create a query like this (this is a simplified example):
(Title:the OR Contents:the) AND (Title:document OR Contents:document)
(we need to split up the query for unrelated reasons)
The problem is that the could be a stopword in the language we are searching in (we search in several languages), causing the entire query to fail. We would like to be able to ignore stop words in generating queries like this, of have the search engine simply return true for the specific stop word search parts
I figure the latter is not possible. (or is it?). Might there be a way to query the stop words for specific language analyzers so we can exclude the stop words ourselves? Or is there a way to alter out query to be able to handle stop words better?
If you want to strip stop words from your search query the only thing I can think of is calling the analyzer with the search query and check the returned tokens.
In this example you would call the en.microsoft analyzer with the search query "the document".
The tokens returned only contain "document", so you know "the" is considered a stop word by the analyzer. But when searching multiple languages you might need to call multiple analyzers and strip stop words for all those languages.

SOLR: how to find result when search term is concatenation of two terms

I have SOLR running and I can get rsults in case of a partial word search
eg. if user searches "micro" I CAN find "microsoft"
but If I search for "microsoftx" I CANNOT get back "microsoft"
what kind of rules do i have to setup in my schema file?
PS. I have little to zero knowledge of SOLR, I literally installed yesterday.
You should be able to solve this by adding an EdgeNgramFilterFactory to your analysis chain for the query part. It'll generate tokens for each part of the original word, giving you a search for micro, micros, microso, microsof, microsoft, microsoftx for microsoftx, depending on how you configure it (minGramSize).

Does GAE Search API do spell checks

I'm talking about this API:
https://cloud.google.com/appengine/docs/java/search/
Does it allow spell checks? For example: if I create an index of documents, and in those documents I have words like "iphone", "android", etc. If I search for "iphoen" instead can it still return the correct results?
No, it cannot. It is just an index - what you put it, you get back.
You need to implement your own logic for spelling errors. If a user searches for "iphoen", you either return all results for "iphoen" and suggest "iphone" query instead, or, if you are very confident that a search term was mis-spelled, do a search for "iphone" right away and ask a user if a "iphoen" should be used. This is how Google search works. This is, obviously, not a trivial task.
No, it will not do this. It does direct text matching. Taken from the link you provided:
The simplest query, sometimes called a "global search" is a string that contains only field values. This search uses a string that searches for documents that contain the words "rose" and "water":
index.search("rose water");
Based on this, it's implied reasonably well that it will not do fuzzy matches for you. However, you could write an extension class that takes a string and tests variants against the Search API. You could then return any successful queries and report the fuzzy match. In this way, your class would take "ipohne" and eventually try "iphone" and return a successful query.

Solr/Lucene - partial fuzzy match

How do you set up partial (substring) fuzzy match in Solr 4.2.1?
For example, if you have a list of US cities indexed, I would like a search term "Alber" to match "Alburquerque".
I have tried using the NGramFilterFactory on the <fieldType> and rebuilt the index but queries do not return results as expected - they still work as if I had just done the standard text_general defaults. Exact matches work, and explicit fuzzy searches would work given sufficient similarity (for example "Alberquerque~" with one misspelling would work.)
I did go to the analyzer tool in the Solr admin and saw that my ngrams were indeed being generated.
Is there something i'm missing from the query side?
Or should I take a different approach altogether?
And can this work with dismax? (Multiple fields indexed like this with different weights)
Thanks!

Resources