Search query for empty\null string field in Azure search - azure-cognitive-search

Hi I have a string field which can be empty\null for certain documents. I like to know how to search for those documents.
I want to know thru searchquery and not ODatFilter as I may like to solve following cases:
searching for empty\null in this field A
searching for substring in this field A( Eg: A:test)
searching for both empty\null or values containing test.
So ODataFilter suggestions will not help in achieving #2 so #1 should be using searchQuery.
Any help is appreciated.

You can actually combine OData filters with search queries using the search.ismatch or search.ismatchscoring functions. These functions let you embed full-text search queries inside a filter, which would allow you to address all your scenarios:
$filter=A ne null
$filter=A ne null and search.ismatchscoring('A:test', null, 'full', null) or equivalently, $filter=A ne null&search=A:test&queryType=full
$filter=A eq null or search.ismatchscoring('A:test', null, 'full', null) -- this can only be achieved with filters and search.ismatch/search.ismatchscoring because of the "or" operator.
The filter A ne null in case 2 above is actually redundant since nulls won't match any full-text search query, but in case 3 where you want to match nulls, filters are the way to do it.

Related

Match all documents excluding some terms using full Lucene syntax

Our service's default search web page uses the * full Lucene query to match all documents. This is before the user has provided any search terms. There is some data (test data, in our case) that we want to exclude from the search result.
Is it possible to match all documents but exclude a subset of all documents?
For example, suppose we have an "owners" field and we want to exclude documents with the "testA" and "testB" owner. The following query does not seem to work with the match all approach:
Query: search=* -owners:testA -owners:testB&queryType=full&$orderby=created desc
Error: "Failed to parse query string. See https://aka.ms/azure-search-full-query for supported syntax."
When searching for anything but *, this approach works fine. For example:
Query: search=foo -owners:testA -owners:testB&queryType=full&$orderby=created desc
Result: (many documents matched)
I have considered a $filter for this and using $filter=filterableOwners/all(p: p ne 'testa' and p ne 'testb') but this has the following drawbacks:
the index must be rebuild with a filterable field
analyzers can't be used so case-insensitivity must be implemented by lowercasing the values and filter expression
Ideally this could be done using only the search query parameter with a Lucene query text.
I found a workaround for the issue. If you have a field in your documents that always has a value, you can use a .* regex to match all values in the field and therefore match all documents.
For example, suppose the packageId field has a value for all documents.
Incorrect (as posted in the original question):
Query: search=* -owners:testA -owners:testB&queryType=full&$orderby=created desc
Correct:
Query: search=packageId:/.*/ -owners:testA -owners:testB&queryType=full&$orderby=created desc

Azure Search using Lucene Query Syntax Returns Incorrect Results

I am using the Microsoft.Azure.Search .NET SDK v5.0.1. I am attempting to perform a search against my Azure Search index as follows: Documents.SearchAsync("fieldname:val* AND timeStamp:2018-05-03T13\:23\:59Z"). The results are incorrect. There are exactly 2 documents in my index with that timestamp. There are 121 documents in my index where the fieldname starts with val. When I run the above query using the SDK, it always returns 121 documents. Is there some special way to query timestamp that I am missing?
There are a few points to make here:
In your index definition, I believe you have timeStamp set to be a String. Otherwise you wouldn't have been able to make the search query as DateTime fields are not searchable. Firstly, I'd advice against treating timeStamp as a string. This is because searchable fields go through a bunch of analysis (tokenization being one of them) Reference on query parsing. In your case, the timestamp query (say 2018-05-03) will be tokenized into smaller constituents (2018, 05, 03) and documents containing any of those terms will be returned. Which is why you observe what you see.
Your scenario seems to be a classic case of "filter" results based on a criteria, followed by "search" on the filtered documents. To accomplish this, you need to do the following:
Use a filter on the timestamp, so that it doesn't go through the analysis
On the filtered results, apply your search query.
Reference
I strongly recommend however that if possible, you should make your timeStamp column a datetime for more reasonable semantics.
As an example, here's how you'd go about achieving a filter + search combo:
parameters = new SearchParameters()
{
Filter = "timeStamp eq '2018-05-03'"
};
Documents.SearchAsync("fieldname:val*", parameters);

Can I do any for loop in Lucene query syntax in Azure search?

In azure search , to search by a word /text in multiple fields at the same time this is how my syntax looks like -
&queryType=full&search=((name:john) || (firstname:john) || (lastname:john) || (middlename:john))
Just wondering if there's any syntax exists like for loop/foreach instead of repeating the search string several times.
imaginary syntax:
&queryType=full&search=(name|| firstname || lastname || middlenamejohn): john
What you're describing is not possible, howerver, in Azure Search your search query is executed against all searchable fields, unless you set the searchFields parameter.
If you want to search over all searchable fields, your query could simply looks like this:
GET https://[service name].search.windows.net/indexes/[index name]/docs?search=john
If the fields in your example are not the only searchable fields and you want you search query to be scoped to them only, use the searchFields parameter:
GET https://[service name].search.windows.net/indexes/[index name]/docs?search=john&searchFields=name,firstname,lastname,middlename
Let me know if that helps

solarium add filter query on multiple fields with OR operator

I'm new in Solr search, Can anyone help me to add multiple fields OR condition I got solutions in solr but I can't implement this in solarium.
I have fields
1. name
2. username
3. email
I want to convert this in Solarium filter query like I use single field for this
$query->createFilterQuery('name')->setQuery('name:*' . $keyword . '*');
I want its result like
fq=(name:*abc* OR username:*abc* OR email:*email*)
Please help me to add multiple fields with OR operator for match if any one matches in any of three field then return me result.
Disjunction Max aka DisMax might point you in the right direction. SitePoint has put together a simple tutorial here: https://www.sitepoint.com/using-solarium-solr-search-implementation/
An example with the parameters named in your question:
$dismax = $query->getDisMax ();
$dismax->setQueryFields ('name username email');

How can I do a prefix search in ElasticSearch in addition to a generic query string?

I have a very basic index of "users" with a single type "user" that has several fields to it. I don't have anything defined on the index besides that.
What I need to do is provide autocomplete results that prioritize prefix matches (for usernames) but also contain other matches from the users bio and website and substring matches of other fields.
How does one accomplish this with the query DSL?
There are different ways to achieve what you want. I'd say it depends on the way you want to make prefix matches. You can use a Prefix Query or make EdgeNGrams out of the user field and search on it without the need of a prefix query. The first option is a little bit slower, while the second one causes an increasement of your index size since you'd index more terms (the ngrams).
If you decide for the prefix query you need to combine different queries together. You can do that using the bool query. You just need to decide which queries must match, which ones must not match and which ones should match (if they are optional). You can also give a boost to each query in order to express that prefix matches are more important for example.
On the other hand, if you decide to index EdgeNGrams you can use a single query string and search on different fields giving a different weight to them, like this:
{
"query" : {
"query_string" : {
"fields" : "user.ngrams^3 field1^2 field2",
"query" : "query"
}
}
}
You also need to take into account that the query string allows you to search for multiple terms (a boolean query is generated out of them) and to use the lucene query syntax. Also, the query string is analyzed while the prefix query is not. It all depends on what you need and whether those features are useful for your usecase.
Let me know if you need more information.

Resources