Cloudant search documents that appear after certain id - cloudant

There is a cloudant database that stores some documents.
There is also mobile app that takes those documents by using search indexes.
Question is:
Is it possible to make query "get me all indexes that appear after this one"?
For example:
I start app, and get from database documents with id 'aaa','aab' and 'aac'.
I want to store last id - 'aac' - in memory of my app.
Then, when I start the app, I want to get from database documents that appeared after 'aac'.
I think the main problem will be, that _ids are assigned as random strings, but I want to be sure.

when searching the index, try including the selector field in JSON object of the request body:
{
"selector": {
"_id": {
"$gt": "the_previous_id"
}
},
"sort": [
{
"_id": "asc"
}
]
}
in addition, from https://docs.cloudant.com/document.html:
"The _id field is either created by you, or generated automatically as a UUID by Cloudant."
therefore, it is possible to provide your own _ids when creating a document if the Cloudant generated _ids are not working for you.
condition operators:
https://docs.cloudant.com/cloudant_query.html#condition-operators

Related

How can I store and search through large documents with MongoDB?

Well. Here's the DB schema/architecture problem.
Currently in our project we use MongoDB. We have one DB with one collection. Overall there are almost 4 billions of documents in that collection (value is constant). Each document has a unique specific ID and there is a lot of different information related to this ID (that's why MongoDB was chosen - data is totally different, so schemaless is perfect).
{
"_id": ObjectID("5c619e81aeeb3aa0163acf02"),
"our_id": 1552322211,
"field_1": "Here is some information",
"field_a": 133,
"field_с": 561232,
"field_b": {
"field_0": 1,
"field_z": [45, 11, 36]
}
}
The purpose of that collection is to store a lot of data, that is easy to update (some data is being updated every day, some is updated once a month) and to search over different fields to retrieve the ID. Also we store the "history" of each field (and we should have ability to search over history as well). So when overtime updates were turned on we faced a problem called MongoDB 16MB maximum document size.
We've tried several workarounds (like splitting document), but all of them include either $group or $lookup stage in aggregation (grouping up by id, see example below), but both can't use indexes, which makes search over several fields EXTREMELY long.
{
"_id": ObjectID("5c619e81aeeb3aa0163acd12"),
"our_id": 1552322211,
"field_1": "Here is some information",
"field_a": 133
}
{
"_id": ObjectID("5c619e81aeeb3aa0163acd11"),
"our_id": 1552322211,
"field_с": 561232,
"field_b": {
"field_0": 1,
"field_z": [45, 11, 36]
}
}
Also we can't use $match stage before those, because the search can include logical operators (like field_1 = 'a' && field_c != 320, where field_1 is from one document and field_c is from another, so the search must be done after grouping/joining documents together) + the logical expression can be VERY complex.
So are there any tricky workarounds? If no, what other DB's can you suggest for moving to?
Kind regards.
Okay, so after some time spent on testing different approaches, I've finally ended up with using Elasticsearch, because there is no way to perform requested searches through MongoDB in adequate amount of time.

How to search one certain field’s value with not unique results on elasticsearch

Need some advise.
I have indexed documents in elasticsearch, now i want to search all results with one certain field value with all related document, like following if i search for reportId=12345, the results will have several documents related, how can i use one api to get results with not unique on certain field
POST myindex/type/_search
{
"query": {
"match": {
"reportId": "12345"
}
}
}

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Cloiudant using $nin There is no index available for this selector

I created a JSON index in cloudant on _id like so:
{
"index": {
"fields": [ "_id"]
},
"ddoc": "mydesigndoc",
"type": "json",
"name": "myindex"
}
First off, unless I specified the index name, somehow cloudant could not differentiate between the index I created and the default text based index for _id (if that is truly the case, then this is a bug I believe)
I ran the following query against the _find endpoint of my db:
{
"selector": {
"_id": {
"$nin":["v1","v2"]
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
The result was this error:
{"error":"no_usable_index","reason":"There is no index available for this selector."}
if I change "$nin":["v1","v2"] to "$eq":"v1" then it works fine, but that is not the query I am after.
So in order to get what I want, I had to this to my selector "_id": {"$gt":null}, which now looks like:
{
"selector": {
"_id": {
"$nin":["v1","v2"],
"$gt":null
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
Why is this behavior? This seems to be only happening if I use the _id field in the selector.
What are the ramifications of adding "_id": {"$gt":null} to my selector? Is this going to scan the entire table rather than use the index?
I would appreciate any help, thank you
Cloudant Query can use Cloudant's pre-existing primary index for selection and range querying without you having to create your own index in the _id field.
Unfortunately, the index doesn't really help when using the $nin operator - Cloudant would have to scan the entire database to check for documents which are not in your list - the index doesn't really get it any further forward.
By changing the operator to $eq you are playing to the strengths of the index which can be used to locate the record you need quickly and efficiently.
In short, the query you are attempting is inefficient. If your query was more complex e.g. the equivalent of WHERE colour='red' AND _id NOT IN ['a','b'] then a Cloudant index on colour could be used to reduce the data set to a reasonable level before doing the $nin operation on the remaining data.

Update a new field to existing document

is there possibility to update a new field to an existing document?
For example:
There is an document with several fields, e.g.
ID=99999
Field1:text
Field2:text
This document is already in the index, now I want to insert a new field to this document WITHOUT the old data:
ID=99999
Field3:text
For now, the old document will be deleted and a new document with the ID will be created. So if I now search for the ID 99999 the result will be:
ID=99999
Field3:text
I read this at the Solr Wiki
How can I update a specific field of an existing document?
I want update a specific field in a document, is that possible? I only need to index one field for >a specific document. Do I have to index all the document for this?
No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only).
In Lucene to update a document the operation is really a delete followed by an add. You will need >to add the complete document as there is no such "update only a field" semantics in Lucene.
So is there any solution for this? Will this function be implemented in a further version (I currently use 3.6.0). As a workaround, I thought about writing a script or an application, which will collect the existing fields, add the new field and update the whole document. But I think this will suffer performance. Do you have any other ideas?
Best regards
I have 2 answers for you (both more or less bad):
To update filed with in document in Solr you have to reindex whole document (to update Field3 within document ID:99999 you have to reindex that document with values for all fields)
In Solr 4 they implemented feature like that, but they have a condition: all fields have to be stored, not just indexed. What is happening that is they are using stored values and reindexing document in the background. If you are interested, there is nice article about it: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ This solution have obvious flaw and that is size of index when you are storing all fields.
I hope that this will help you with your problem. If you have some more questions, please ask
It is possible to do this in Solr 4. E.g. Consider the following document
{
"id": "book123",
"name" : "Solr Rocks"
}
In order to add an author field to the document the field value would be a json object with "set" attribute and the field value
$ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d '
[
{"id" : "book123",
"author" : {"set":"The Community"}
}
]'
Your new document
$ curl http://localhost:8983/solr/get?id=book123
will be
{
"doc" : {
"id" : "book123",
"name" : "Solr Rocks"
"author": "The Community"
}
}
Set will add or replace the author field. Along with set you also have the option to increment(inc) and adding(add)
From Solr 4 onwards you can update a field in solr ....no need to reindex the entire indexes .... various modifiers are supported like ....
set – set or replace a particular value, or remove the value if null is specified as the new value
add – adds an additional value to a list
remove – removes a value (or a list of values) from a list
removeregex – removes from a list that match the given Java regular expression
inc – increments a numeric value by a specific amount (use a negative value to decrement)
example :
document
{
"id": "1",
"name" : "Solr"
"views" : "2"
}
now update with
$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "1",
"author" : {"set":"Neal Stephenson"},
"views" : {"inc":3},
}
]'
will result into
{
"id": "1",
"name" : "Solr"
"views" : "5"
"author" : "Neal Stephenson"
}

Resources