Cloiudant using $nin There is no index available for this selector - cloudant

I created a JSON index in cloudant on _id like so:
{
"index": {
"fields": [ "_id"]
},
"ddoc": "mydesigndoc",
"type": "json",
"name": "myindex"
}
First off, unless I specified the index name, somehow cloudant could not differentiate between the index I created and the default text based index for _id (if that is truly the case, then this is a bug I believe)
I ran the following query against the _find endpoint of my db:
{
"selector": {
"_id": {
"$nin":["v1","v2"]
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
The result was this error:
{"error":"no_usable_index","reason":"There is no index available for this selector."}
if I change "$nin":["v1","v2"] to "$eq":"v1" then it works fine, but that is not the query I am after.
So in order to get what I want, I had to this to my selector "_id": {"$gt":null}, which now looks like:
{
"selector": {
"_id": {
"$nin":["v1","v2"],
"$gt":null
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
Why is this behavior? This seems to be only happening if I use the _id field in the selector.
What are the ramifications of adding "_id": {"$gt":null} to my selector? Is this going to scan the entire table rather than use the index?
I would appreciate any help, thank you

Cloudant Query can use Cloudant's pre-existing primary index for selection and range querying without you having to create your own index in the _id field.
Unfortunately, the index doesn't really help when using the $nin operator - Cloudant would have to scan the entire database to check for documents which are not in your list - the index doesn't really get it any further forward.
By changing the operator to $eq you are playing to the strengths of the index which can be used to locate the record you need quickly and efficiently.
In short, the query you are attempting is inefficient. If your query was more complex e.g. the equivalent of WHERE colour='red' AND _id NOT IN ['a','b'] then a Cloudant index on colour could be used to reduce the data set to a reasonable level before doing the $nin operation on the remaining data.

Related

Azure Search highlighting doesn't work for wildcards with scoring profiles

Azure Search supports highlighting with full text search which facilitates clients to locate the matched term in a returned document. I have provided a simple index schema below to illustrate the issue.
{
"name": "simple-index",
"fields": [
{
"name": "key",
"type": "Edm.String"
},
{
"name": "simplefield",
"type": "Edm.String"
}
],
"scoringProfiles": [
{
"name": "boostedprofile",
"functionAggregation": null,
"text": {
"weights": {
"simplefield": 5,
}
},
"functions": []
}
],
"corsOptions": null,
"suggesters": [],
"analyzers": [],
"tokenizers": [],
"tokenFilters": [],
"charFilters": []
}
For a normal search query like below, it works as expected and gives back the expected result.
search=foobar&highlight=simplefield
On extending the above query to use a wildcard query, things are again as expected with the response containing highlights on the terms matching the prefix. So far so good.
search=foo*&highlight=simplefield&querytype=full
After this when I apply a scoring profile on top of the previous query, the results are unexpected and no highlights are returned.
search=foo*&highlight=simplefield&querytype=full&scoringprofile=boostedprofile
How do I make highlights work for the wildcard queries when using a scoring profiles?
At the time of answering, this is a known limitation in Azure Search where highlighting doesn't work for wildcard queries when used with scoring profiles. Internally Azure Search uses a concept of highlighter which is responsible for the highlighting flow as a separate process that happens after search.
In the case of wildcard query, it involves looking up all terms in the index that match the provided prefix term and then use them to compose the highlighted text. Scoring profiles affect the way terms are looked up in index for highlighting. Due to that the result doesn't include any highlights.
As this is a specific limitation in wildcard queries, one workaround is to pre-process the index to avoid issuing wildcard/prefix queries. Please take a look at custom analysis (https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search) You can, for example, use edgeNgram tokenfilter and store prefixes of words in the index and issue a regular term query with the prefix (with out the '*' operator)
I hope this is useful. Please vote on the feedback item to help us prioritize our development efforts to support other modes of highlighting that will support the above use-case. https://feedback.azure.com/forums/263029-azure-search/suggestions/32661961-implement-other-highlighters

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Sorting Array type in Elasticsearch

I'm locked in a trap with Elastic, trying to sort hits by the size of a sub-property (array).
I applied the following body query :
'{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "number",
"script": "doc[\"myarray\"].values.size()",
"order": "desc"
}
}
}'
However as Elastic Array type isn't in the mapping (support out of the box) i have an error telling me that my array isn't the mapping (normal...)
Any idea ?
Thanks !
The best and recommended way of doing this is to index in the same document an additional field that should include the field size as a number, since this is known at indexing time. And then just sort on that field.
The difficulty of the apparently simple task you want to achieve is that the array in Elasticsearch is considered a flat data structure and everything is just "combined". If you also are using an analyzer on this field that potentially will split the field into terms, do you count the number of unique terms or the values that you indexed initially?
For example, let's say myarray is like ["abc 123", "abc", "123", "abc abc"]. Do you count the comma separated values (4 values in total) or the unique terms (abc and 123 so only 2 values in total)?
The correct and most efficient way of doing this is by indexing the length itself in the documents:
{ "myarray":["abc 123", "abc", "123", "abc abc"], "myarray_length":4 }
if you want the array's size then you have use dynamic scripting offered from elastic search.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/modules-scripting.html (choose your version).
If you use AWS for hosting ES please do read this
https://kirankoduru.github.io/elasticsearch/moving-from-aws-elasticsearch-service.html
And andrei-stefan is correct you can make use of multi field as type when mapping

Cloudant Search: match a whole phrase using a full text index

I want to be able to match a whole phrase using a full text index, but I can't seem to work out how to do it. The Lucene Query Parser syntax states that:
A Phrase is a group of words surrounded by double quotes such as "hello dolly".
But when I specify the following selector, it returns all records with either "sign" or "design" in the name but I would expect it to return only those with "sign design".
POST https://foo.cloudant.com/remote/_find
{"selector":{"$text":"\"SIGN DESIGN\""}}
My index is defined as follows:
db.index({
name: 'subbies_text',
type: 'text',
index: {},
})
Alternatively, is it possible to do a substring match on a field in json index?
You are using the index API to create the index, correct?
Would you please try creating this design document?
{ "_id": '_design/library',
"indexes": {
"subbies_text": {
"analyzer": {
"name":'standard'
},
"index": "function(doc) { index('XXX', doc.YYY); }"
}
}
}
(However, change the "XXX" and "YYY" to your field name.
If you know how many maximum words to allow, you can make a searchable index with a map-reduce view. I think it is not ideal, but just for posterity:
You can emit() every consecutive pair of words that you see. So, for example, given the phrase "The quick brown fox" then you can emit ["the","quick"], ["quick","brown"], ["brown", "fox"]. I think this can be nice and simple, but it's really only appropriate for small amounts of data. The index will likely grow too large.
If you want to use cloudant search, you should create a search index first just like JasonSmith said. Then you can use this search index to do the specific queries.
Suppose you have a document which has a "name:SIGNDESIN" field.
1.If you want to query a whole phrase ,you can query like this:
curl https://<username:password>#<username>.cloudant.com/db/_design/<design_doc>/_search/<searchname>?q=name:SIGNDESIN | jq .
2.If you want to query a substring phrase, you can query like this:
curl https://<username:password>#<username>.cloudant.com/db/_design/<design_doc>/_search/<searchname>?q=name:SI* | jq .

Cloudant search documents that appear after certain id

There is a cloudant database that stores some documents.
There is also mobile app that takes those documents by using search indexes.
Question is:
Is it possible to make query "get me all indexes that appear after this one"?
For example:
I start app, and get from database documents with id 'aaa','aab' and 'aac'.
I want to store last id - 'aac' - in memory of my app.
Then, when I start the app, I want to get from database documents that appeared after 'aac'.
I think the main problem will be, that _ids are assigned as random strings, but I want to be sure.
when searching the index, try including the selector field in JSON object of the request body:
{
"selector": {
"_id": {
"$gt": "the_previous_id"
}
},
"sort": [
{
"_id": "asc"
}
]
}
in addition, from https://docs.cloudant.com/document.html:
"The _id field is either created by you, or generated automatically as a UUID by Cloudant."
therefore, it is possible to provide your own _ids when creating a document if the Cloudant generated _ids are not working for you.
condition operators:
https://docs.cloudant.com/cloudant_query.html#condition-operators

Resources