I have a cloudant DB where each document looks like:
{
"_id": "2015-11-20_attr_00",
"key": "attr",
"value": "00",
"employeeCount": 12,
"timestamp": "2015-11-20T18:16:05.366Z",
"epocTimestampMillis": 1448043365366,
"docType": "attrCounts"
}
For a given attribute there is an employee count. As you can see I have a record for the same attribute every day. I am trying to create a view or index that will give me the latest record for this attribute. Meaning if I inserted a record on 2015-10-30 and another on 2015-11-10, then the one that is returned to me is just employee count for the record with timestamp 2015-11-10.
I have tried view, but I am getting all the entries for each attribute not just the latest. I did not look at indexes because I thought they do not get pre calculated. I will be querying this from client side, so having it pre calculated (like views are) is important.
Any guidance would be most appreciated. thank you
I created a test database you can see here. Just make sure your when you insert your JSON document into Cloudant (or CouchDB), your timestamps are not strings but JavaScript data objects:
https://examples.cloudant.com/latestdocs/_all_docs?include_docs=true
I built a search index like this (name the design doc "summary" and the search index "latest"):
function (doc) {
if ( doc.docType == "totalEmployeeCounts" && doc.key == "div") {
index("division", doc.value, {"store": true});
index("timestamp", doc.timestamp, {"store": true});
}
}
Then here's a query that will return only the latest record for each division. Note that the limit value will apply to each group, so with limit=1, if there are 4 groups you will get 4 documents not 1.
https://examples.cloudant.com/latestdocs/_design/summary/_search/latest?q=*:*&limit=1&group_field=division&include_docs=true&sort_field=-timestamp
Indexing TimeStamp as a string is not recommended.
Reference:
https://cloudant.com/blog/defensive-coding-in-mapindex-functions/#.VvRVxtIrJaT
I have the same problem. I converted the timestamp value to milliseconds (number) and then indexed that value.
var millis= Date.parse(timestamp);
index("millis",millis,{"store": false});
You can use the same query as Raj suggested but with the 'millis' field instead of the timestamp .
Related
I need your help and guidance about what is the best way to update a field table everytime an item is added into a table, or deleted from the same table.
In my main collection I have a field called site, that is populated with one of these values => a, b or c.
Because with graphql I was not able to do so, the quick an easier fix was to have this subTotal collection that stores the subTotal of these 3 fields. Now I need to update this collection with lifecycle hooks, or with a cronjob. I don't know how to do this with either of these 2 methods.
I am looking for something like this
Add a new item with value a in field called "a", then do a +1 to my new collection with the field "a".
The same applies while deleting an item with the value "c" in the field site, then -1 the value in the field "c" from subTotal collection.
POST Old.site.a ==> New.a +1 | DEL Old.site.b ==> New.b -1
This new collection stores the subtotals of each category I have. I did this, just because I was not able to retrieve the subtotals with a graphql query. Please see here ==> GraphQL Subcategory Count Aggregated Query
What I was looking for, was a query that would retrieve all the subtotal of site field from webs collection in a format like this:
{
"data": {
"webs": { for each value from site field retrieve subtotal
"meta": {
"pagination": {
"a": {
"total": 498},
"b": :{
"total": 3198},
},
"c": :{
"total": 998},
},
}
}
}
}
I know that would be a big stress for Strapi to update these fields everytime a POST or DEL method is made, so maybe a cronjob would suffix, that can run every 5 minutes or so, would be a better idea.
Could you please help me?
I would owe you a lot!
I have a DynamoDB table called URLArray that contains a list of URL's (myURL) and a unique video number (myNum).
I use AWS Amplify to query my table like so for example:
URLData = await API.graphql(graphqlOperation(getUrlArray, { id: "173883db-9ff1-4...."}));
Also myNum is a GSI, so i can also query the row using it, for example:
URLData = await API.graphql(graphqlOperation(getURLinfofromMyNum, { myNum: 5 }));
My question is, I would like to simply query this table to know what the maximum number of myNum is. So in this picture it'd return myNum = 12. How do i query my table to get this?
DynamoDb does not have the equivalent of the SQL expression select MAX(myNum), so you cannot do what you are asking with your table as-is.
A few suggestions:
Record the highest value of myNum as you insert items into the table. For example, you could create an item with PK = "METADATA" and an attribute named maxMyNum. The maxMyNum attribute could be updated conditionally if you have a value that is higher than what is stored in DDB.
You could build a secondary index with myNum as the sort key in a single partition. This would allow you to execute a query operation with ScanIndexForward set to false (descending order), and pick the first returned entry (the max value)
If you are generating an auto incrementing value in your application code, consider checking out the documentation regarding atomic counters.
I have an item like this
{
"date": "2019-10-05",
"id": "2",
"serviceId": "1",
"time": {
"endTime": "1300",
"startTime": "1330"
}
}
Right now the way I design this is like so:
primary key --> id
Global secondary index --> primary key : serviceId
--> sort key : date
With the way I designed as of now,
* I can query the id
* I can query serviceId and range of date
I'd like to be able to query such that I can retrieve all items where
* serviceId = 1 AND
* date = "yyyy-mm-dd" AND
* time = {
"endTime": "1300",
"startTime": "1330"
}
I'd still like to be able to query based on the 2 previous condition (query by id, and query by serviceId and rangeOfDate
Is there a way to do this? one way I was thinking is to create a new field and use it as index e.g: combine all data so
combinedField: "1_yyyy-mm-dd_1300_1330
make that as primary key for global secondary index, and just query it like that.
I'm just not sure is this the way to do this or if there's a better or best practice way to do this?
Thank you
You could either use FilterExpression or composite sort keys.
FilterExpression
Here you could retrieve the items from the GSI you described by using specifying 'serviceId' and 'date' and then giving within the 'FilterExpression' specifying time.startTime and time.endTime. The sample Python code using boto3 would be as follows:
response = table.query(
KeyConditionExpression=Key('serviceId').eq(1) & Key('date').eq("2019-10-05"),
FilterExpression=Attr(time.endTime).eq('1300') & Attr('time.startTime').eq('1330')
)
The drawback with this method is that all items specified with the sort key will be read and only then the results are filtered. So you will be charged according to what is specified in the sort key.
eg: if 1000 items have 'serviceId' as 1 and 'date' as '2019-10-05' but only 10 items have 'time.startTime' as 1330, then still you will be charged for reading the 1000 items even though only 10 items will be returned after the FilterExpression is applied.
Composite Sort Key
I believe this is the method you mentioned in the question. Here you will need to make an attribute as
'yyyy-mm-dd_startTime_endTime'
and use this as the sort key in your GSI. Now your items will look like this:
{ "date": "2019-10-05",
"id": "2",
"serviceId": "1",
"time": {
"endTime": "1300",
"startTime": "1330"
}
"date_time":"2019-10-05_1330_1300"
}
Your GSI will have 'serviceId' as partition key and 'date_time' as sort key. Now you will be able to query date range as:
response = table.query(
KeyConditionExpression=Key('serviceId').eq(1) & Key('date').between('2019-07-05','2019-10-05')
)
For the query where date, start and end time are specified, you can query as:
response = table.query(
KeyConditionExpression=Key('serviceId').eq(1) & Key('date').eq('2019-10-05_1330_1300')
)
This approach won't work if you need range of dates and start and end time together ie. you won't be able to make a query for items in a particular date range containing a specific start and end time. In that case you would have to use FilterExpression.
Yes, the solution you suggested (add a new field which is the combination of the fields and defined a GSI on it) is the standard way to achieve that. You need to make sure that the character you use for concatenation is unique, i.e., it cannot appear in any of the individual fields you combine.
I am using Parse Server, which runs on MongoDB.
Let's say I have collections User and Comment and a join table of user and comment.
User can like a comment, which creates a new record in a join table.
Specifically in Parse Server, join table can be defined using a 'relation' field in the collection.
Now when I want to retrieve all comments, I also need to know, whether each of them is liked by the current user. How can I do this, without doing additional queries?
You might say I could create an array field likers in Comment table and use $elemMatch, but it doesn't seem as a good idea, because potentially, there can be thousands of likes on a comment.
My idea, but I hope there could be a better solution:
I could create an array field someLikers, a relation (join table) field allLikers and a number field likesCount in Comment table. Then put first 100 likers in both someLikers and allLikers and additional likers only in the allLikers. I would always increment the likesCount.
Then when querying a list of comments, I would implement the call with $elemMatch, which would tell me whether the current user is inside someLikers. When I would get the comments, I would check whether some of the comments have likesCount > 100 AND $elemMatch returned null. If so, I would have to run another query in the join table, looking for those comments and checking (querying by) whether they are liked by the current user.
Is there a better option?
Thanks!
I'd advise agains directly accessing MongoDB unless you absolutely have to; after all, the way collections and relations are built is an implementation detail of Parse and in theory could change in the future, breaking your code.
Even though you want to avoid multiple queries I suggest to do just that (depending on your platform you might even be able to run two Parse queries in parallel):
The first one is the query on Comment for getting all comments you want to display; assuming you have some kind of Post for which comments can be written, the query would find all comments referencing the current post.
The second query again is for on Comment, but this time
constrained to the comments retrieved in the first query, e.g.: containedIn("objectID", arrayOfCommentIDs)
and constrained to the comments having the current user in their likers relation, e.g.: equalTo("likers", currentUser)
Well a join collection is not really a noSQL way of thinking ;-)
I don't know ParseServer, so below is just based on pure MongoDB.
What i would do is, in the Comment document use an array of ObjectId's for each user who likes the comment.
Sample document layout
{
"_id" : ObjectId(""),
"name" : "Comment X",
"liked" : [
ObjectId(""),
....
]
}
Then use a aggregation to get the data. I asume you have the _id of the comment and you know the _id of the user.
The following aggregation returns the comment with a like count and a boolean which indicates the user liked the comment.
db.Comment.aggregate(
[
{
$match: {
_id : ObjectId("your commentId")
}
},
{
$project: {
_id : 1,
name :1,
number_of_likes : {$size : "$liked"},
user_liked: {
$gt: [{
$size: {
$filter: {
input: "$liked",
as: "like",
cond: {
$eq: ["$$like", ObjectId("your userId")]
}
}
}
}, 0]
},
}
},
]
);
this returns
{
"_id" : ObjectId(""),
"name" : "Comment X",
"number_of_likes" : NumberInt(7),
"user_liked" : true
}
Hope this is what your after.
This is one user's notes. I want to query and get only the notes of this use with "activeFlag:1". My query object code is
findAccountObj =
{ _id: objectID(req.body.accountId),
ownerId: req.body.userId,
bookId: req.body.bookId,
"notes.activeFlag": 1 };
But this query returns all the notes, including the ones with "activeFlag:0".
How do I fix this?
If you are on v2.2, use elementmatch operator. v3.2 and above allow aggregation and filter to return a subset of a document.
here is an example Retrieve only the queried element in an object array in MongoDB collection