MongoDB aggregate sorting different response on same values - database

when i was pagination with mongodb aggregation, i found a problem. I'll detail this a little bit. This problem only happens when sorting. I made a pattern to keep it simple.
Model have a count (number) random generated for sorting and have id (number) for so that we can do it visually and it is unique.
Aggregate pipeline like
db.getCollection('test').aggregate([{
$sort:{
count:-1
}
},
{
$skip : 0
},
{
$limit :2
}])
Example limit 2
Returned data
/* 1 */
{
"_id" : ObjectId("6027005ffba493078dca3580"),
"count" : 9,
"id" : 38
}
/* 2 */
{
"_id" : ObjectId("6027005ffba493078dca3565"),
"count" : 9,
"id" : 11
}
When limit example 3
Returned data
/* 1 */
{
"_id" : ObjectId("6027005ffba493078dca3587"),
"count" : 9,
"id" : 45
}
/* 2 */
{
"_id" : ObjectId("6027005ffba493078dca3580"),
"count" : 9,
"id" : 38
}
/* 3 */
{
"_id" : ObjectId("6027005ffba493078dca3565"),
"count" : 9,
"id" : 11
}
For limit 2 first element id = 38,
For limit 3 first element id = 45
And its returned always different response for different limit skip value, for same value sorting.
This problem is preventing me from pagination properly.
If I add a second sorting example
$sort:{
count:-1
_id: 1
}
}
Its a resolved.
What is the reason? and another solution?

This is expected behavior when document with id:45 is inserted immediately after the limit:2 aggregation example was executed , the limit:3 example will show you the document with id:45 on the top place since the documents/indexes are loaded for sorting based on the natural insertion order by document _id and the _id is monotonically increasing already sorted.
Based on the above I don't think it make any difference if you add also the _id in the sort stage since documents will be automatically sorted based on the _id if it is not specified and the count value is same for all documents ...

Related

Search a fixed amount of documents over a period of time in MongoDB

We have a database with a lot of documents, which gets bigger as time goes on. Right now, query time isn't a problem since the data is only ~1 year old or so. But the bigger this gets, the longer queries will take if we query everything.
Our idea was to take every nth document, the more documents there are, you leave some data out, but you still get a good image from data over the time. However, this is hard to do in Mongo and doesn't seem to work at all, since it still traverses all documents.
Is there a way to set a fixed query time, no matter how many documents, or at least reduce it? It doesn't matter if we lose data overall, as long as we get documents from every time range.
I don't know exactly how your data looks like, but here is an example of what I mean. Let's assume this is your data stored in the database.
/* 1 */
{
"_id" : ObjectId("59e272e74d8a2fe38b86187d"),
"name" : "data1",
"date" : ISODate("2017-11-07T00:00:00.000Z"),
"number" : 15
}
/* 2 */
{
"_id" : ObjectId("59e272e74d8a2fe38b86187f"),
"name" : "data2",
"date" : ISODate("2017-11-06T00:00:00.000Z"),
"number" : 19
}
/* 3 */
{
"_id" : ObjectId("59e272e74d8a2fe38b861881"),
"name" : "data3",
"date" : ISODate("2017-10-06T00:00:00.000Z"),
"number" : 20
}
/* 4 */
{
"_id" : ObjectId("59e272e74d8a2fe38b861883"),
"name" : "data4",
"date" : ISODate("2017-10-05T00:00:00.000Z"),
"number" : 65
}
I understand you want to compare some values throughout months or even years. So you could do the following
db.getCollection('test').aggregate([
{
$match: {
// query on the fields with index
date: {$gte: ISODate("2017-10-05 00:00:00.000Z"),
$lte: ISODate("2017-11-07 00:00:00.000Z")}
}
},
{
// retrieve the month from each document
$project: {
_id: 1,
name: 1,
date: 1,
number: 1,
month: {$month: "$date"}
}
},
{
// group them by month and perform some accumulator operation
$group: {
_id: "$month",
name: {$addToSet: "$name"},
dateFrom: {$min: "$date"},
dateTo: {$max: "$date"},
number: {$sum: "$number"}
}
}
])
I would suggest you save the pre aggregated data, this way instead of searching through 30 documents per month for example you'd only need to search for 1 per month. And you'd only have to aggregate the complete data only once, if you have the pre aggregated results stored then you'd only have to run the pre aggregation for the new data that are coming in.
Is that maybe something you are looking for?
Also if you have indexes and they fields you query have indexes then this helps as well. Otherwise MongoDB has to scan every document in a collection.

Fetch specific array elements from a array element within another array field in mongodb

My document structure is as below.
{
"_id" : {
"timestamp" : ISODate("2016-08-27T06:00:00.000+05:30"),
"category" : "marketing"
},
"leveldata" : [
{
"level" : "manager",
"volume" : [
"45",
"145",
"2145"
]
},
{
"level" : "engineer",
"volume" : [
"2145"
]
}
]
}
"leveldata.volume" embedded array document field can have around 60 elements in it.
In this case, "leveldata" is an array document.
And "volume" is another array field inside "leveldata".
We have a requirement to fetch specific elements from the "volume" array field.
For example, elements from specific positions, For Example, position 1-5 within the array element "volume".
Also, we have used positional operator to fetch the specific array element in this case based on "leveldata.level" field.
I tried using the $slice operator. But, it seems to work only with arrays not with array inside array fields, as that
is the case in my scenario.
We can do it from the application layer, but that would mean loading the entire the array element from mongo db to memory and
then fetching the desired elements. We want to avoid doing that and fetch it directly from mongodb.
The below query is what I had used to fetch the elements as required.
db.getCollection('mycollection').find(
{
"_id" : {
"timestamp" : ISODate("2016-08-26T18:00:00.000-06:30"),
"category" : "sales"
}
,
"leveldata.level":"manager"
},
{
"leveldata.$.volume": { $slice: [ 1, 5 ] }
}
)
Can you please let us know your suggestions on how to address this issue.
Thanks,
mongouser
Well yes you can use $slice to get that data like
db.getCollection('mycollection').find({"leveldata.level":"manager"} , { "leveldata.volume" : { $slice : [3 , 1] } } )

MongoDB - Index for object update in nested Array

Assume we have the following collection, which I have a question about:
{
"_id" : 1,
"user_id" : 12345,
"items" : [
{
"item_id" : 1,
"value" : 21,
"status" : "active"
},
{
"item_id" : 2,
"value" : 22,
"status" : "active"
},
{
"item_id" : 3,
"value" : 23,
"status" : "active"
},
...
{
"item_id" : 1000,
"value" : 1001,
"status" : "active"
},
]
}
In a collection I have a lot of documents (as much as users in the system, at about 100K documents in collection). In every document I have around 1000 documents inside array "items"
The list of operations that will be used:
Read whole document once user logins to the system (rare operation).
Update a single document in a nested array items and set "value" and "status" almost on every "user click" (frequent operation)
db.items.update({_id : 1 , "items.item_id" : 1000} , {$set: {"items.$.value": 1000}})
Insert a new document to a collection with 1000 documents in nested array. This operation will be done on a new user registration (very rare operation)
The question is: Do I need to create a compound index like
db.items.createIndex( { "_id": 1, "items.item_id": 1 } )
to help the MongoDB to update certain document inside array or MongoDB does search in whole document no matter of compound index? Or maybe someone can propose a different schema for such a scenario?

findBy query not returning correct page info

I have a Person collection that is made up of the following structure
{
"_id" : ObjectId("54ddd6795218e7964fa9086c"),
"_class" : "uk.gov.gsi.hmpo.belt.domain.person.Person",
"imagesMatch" : true,
"matchResult" : {
"_id" : null,
"score" : 1234,
"matchStatus" : "matched",
"confirmedMatchStatus" : "notChecked"
},
"earlierImage" : DBRef("image", ObjectId("54ddd6795218e7964fa9086b")),
"laterImage" : DBRef("image", ObjectId("54ddd67a5218e7964fa908a9")),
"tag" : DBRef("tag", ObjectId("54ddd6795218e7964fa90842"))
}
Notice that the "tag" is a DBRef.
I've got a Spring Data finder that looks like the following:
Page<Person> findByMatchResultNotNullAndTagId(#Param("tagId") String tagId, Pageable page);
When this code is executed the find query looks like the following:
{ matchResult: { $ne: null }, tag: { $ref: "tag", $id: ObjectId('54ddd6795218e7964fa90842') } } sort: {} projection: {} skip: 0 limit: 1
Which is fine, I get a collection of 1 person back (limit=1). However the page details are not correct. I have 31 persons in the collection so I should have 31 pages. What I get is the following:
"page" : {
"size" : 1,
"totalElements" : 0,
"totalPages" : 0,
"number" : 0
}
The count query looks like the following:
{ count: "person", query: { matchResult: { $ne: null }, tag.id: "54ddd6795218e7964fa90842" } }
That tag.id doesn't look correct to me compared with the equivalent find query above.
I've found that if I add a new method to org.springframework.data.mongodb.core.MongoOperations:
public interface MongoOperations {
public long count(Query query, Class<?> entityClass, String collectionName);
}
And then re-jig AbstractMongoQuery.execute(Query query) to use that method instead of the similar method without the entityClass parameter then I get the correct paging results.
Question: Am I doing something wrong or is this a bug in Spring Data Mongo?
Edit
Taking inspiration from Christoph I've added the following test code on Git https://github.com/tedp/Spring-Data-Test
The information contained in the Page returned depends on the query executed. Assuming a total number of 31 elements in you collection, only a few of them, or even just one might match the given criteria by referencing the tag with id: 54ddd6795218e7964fa90842. Therefore you only get the total elements that match the query, and not the total elements within your collection.
This bug was actually fixed DATAMONGO-1120 as pointed out by Christoph. I needed to override the spring data version to use 1.6.2.RELEASE until the next iteration of Spring Boot where presumably Spring Data will be up lifted to at least 1.6.2.RELEASE.

Mongo DB: Sorting by the number of matches

I have an array of objects, and I want to query in a MongoDB collection for documents that have elements that match any objects in my array of objects.
For example:
var objects = ["52d58496e0dca1c710d9bfdd", "52d58da5e0dca1c710d9bfde", "52d91cd69188818e3964917b"];
db.scook.recipes.find({products: { $in: objects }}
However, I want to know if I can sort the results by the number of matches in MongoDB.
For example, at the top will be the "recipe" that has exactly three elements matches: ["52d58496e0dca1c710d9bfdd", "52d58da5e0dca1c710d9bfde", "52d91cd69188818e3964917b"].
The second selected has two recipes: i.e. ["52d58496e0dca1c710d9bfdd", "52d58da5e0dca1c710d9bfde"], and the third one only one: i.e. ["52d58496e0dca1c710d9bfdd"]
It would be great if you could get the number of items it had.
By using the aggregation framework, I think that you should be able to get what you need by the following MongoDB query. However, if you're using Mongoose, you'll have to convert this to a Mongoose query. I'm not certain this will work exactly as is, so you may need to play with it a little to make it right. Also, this answer hinges on whether or not you can use the $or operator inside of the $project operator and that it will return true. If that doesn't work, I think you'll need to use map-reduce to get what you need or do it server side.
db.recipes.aggregate(
// look for matches
{ $match : { products : { $or : objects }}},
// break apart documents to by the products subdocuments
{ $unwind : "$products" },
// search for matches in the sub documents and add productMatch if a match is found
{ $project : {
desiredField1 : 1,
desiredField2 : 1,
products : 1,
// this may not be a valid comparison, but should hopefully
// be true or 1 if there is a match
productMatch : { "$products" : { $or : objects }}
}},
// group the unwound documents back together by _id
{ $group : {
_id : "$_id",
products : { $push : "$products" },
// count the matched objects
numMatches : { $sum : "$productMatch" },
// increment by 1 for each product
numProducts : { $sum : 1 }
}},
// sort by descending order by numMatches
{ $sort : { numMatches : -1 }}
)

Resources