Mongodb Mapreduce join array - arrays

I have a big collection of songs and want to get most played songs per week, in a array. as example:
{
"_id" : {
"title" : "demons savaites hitas",
"name" : "imagine dragons"
},
"value" : {
"weeks" : [
{
"played" : 56,
"week" : 9,
"year" : 2014
}
]
}
}
It sometimes becomes:
{
"_id" : {
"title" : "",
"name" : "top 15"
},
"value" : {
"played" : 1,
"week" : 8,
"year" : 2014
}
}
The collection which i get the data from is named songs and new fields get added all the time when a songs get added. No unique artistnames or songtitles and every document in the collection looks like this:
{
"_id" : ObjectId("530536e3d4ca1a783342f1c8"),
"week" : 8,
"artistname" : "City Shakerz",
"songtitle" : "Love Somebody (Summer 2012 Mix Edit)",
"year" : 2014,
"date" : ISODate("2014-02-19T22:57:39.926Z")
}
I now want to do a mapreduce which add the new week to the array. It now overwrites it.
I also noted when trying to change to a array, not all the played get counted, with the new mapreduce.
The new mapreduce not working, with weeks:
map = function () {
if (this.week == 9 && this.year == 2014) emit({title:this.songtitle.toLowerCase(), name:this.artistname.toLowerCase()}, {played:1, week:this.week, year:this.year});
}
reduce = function(k, values) {
var result = {};
result.weeks = new Array();
var object = {played:0, week: 0, year: 0};
values.forEach(function(value) {
object.played += value.played;
object.week = value.week;
object.year = value.year;
});
result.weeks.push(object);
return result;
}
db.songs.mapReduce(map,reduce,{out: {reduce:"played2"}})
This is the old one i'm using with is a new field in the collection per week and song:
map = function () {
if (this.week == 10 && this.year == 2014) emit({title:this.songtitle.toLowerCase(), name:this.artistname.toLowerCase(), week:this.week, year:this.year}, {count:1});
}
reduce = function(k, values) {
var result = {count: 0,};
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
db.songs.mapReduce(map,reduce,{out: {merge:"played"}})
I get the information fro the toplist right now from played2 like this:
db.played2.find({'_id.week': 9,'_id.year': 2014}).sort(array("value.count" => -1)).limit(50)
Above line can include any typo because i use mongoclient for php and needed to change it to javascript syntax for you.
What am I doing wrong?

I found out that I could do mapreduce as the code snippet above and then just get this week in a query and another one for previous week and do simple double for with a if to update this week with previous week place.
I made the script in python, which i run also for my mapreduce as a cronjob. As example:
if len(sys.argv) > 1 and sys.argv[1] is not None:
week = int(sys.argv[1])
else:
week = (datetime.date.today().isocalendar()[1]) - 1
year = datetime.date.today().year
previous_week = week - 1
client = MongoClient()
db = client.db
played = db.played
print "Updating it for week: " + str(week)
previous = played.find({"_id.week": previous_week, "_id.year": year}).sort("value.count", -1).limit(50)
thisweek = played.find({"_id.week": week, "_id.year": year}).sort("value.count", -1).limit(50)
thisplace = 1
for f in thisweek:
previous.rewind() # Reset second_collection_records's iterator
place = 1
if previous.count() > 0:
checker = bool(1)
for s in previous:
if s["_id"]["name"] == f["_id"]["name"] and s["_id"]["title"] == f["_id"]["title"]:
result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":place, "place.this_week":thisplace}})
checker = bool(0)
print result
place = place + 1
if checker is True:
result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":0, "place.this_week":thisplace}})
print result
else:
result = played.update({"_id.week": f["_id"]["week"], "_id.year": f["_id"]["year"], "_id.title": f["_id"]["title"], "_id.name": f["_id"]["name"]}, {"$set": {"place.previous_week":0, "place.this_week":thisplace}})
print result
thisplace = thisplace + 1
print "done."
This seems to work very good. Hopefully mongodb adds support to just update a field or anything in mapreduce to add information to a document without overwrite it.

I'm taking a stab at the structure of your collection based on your input fields, but I don't think mapReduce is the tool you want. Your apparent desired output can be achieved using aggregate :
db.collection.aggregate([
// Match a specific week and year if you want - remove if you want all
{ "$match": { "year": inputYear, "week": inputWeek } },
// Group to get the total number of times played
{ "$group": {
"_id": {
"title": { "$toLower": "$songtitle" },
"name": { "$toLower": "$artistname" },
"week": "$week",
"year": "$year"
},
played: { "$sum": 1 }
}},
// Sort the results by the most played in the range
{ "$sort": { "year": -1, "week": -1, "played": -1 } },
// Optionally limit to the top 15 results
{ "$limit": 15 }
])
That basically is what you appear to be trying to do. So this sums up the "number of appearances" as the number of times played. Then we take the additional steps of sorting the results, and optionally (if you can live with looking for one week at a time) limits the results to a set number. Those last two steps you won't get with mapReduce.
If you are ultimately looking for the "top ten" for each week, as a single query result, then you can look at this for a discussion (and methods to achieve) what we call the "topN" results problem.

Related

update single item within array where multiple conditions apply to any array item

I have a basic mongoose model with an attribute instruments which represents an array. Therefore it consists multiple items, which each have the following attributes: name, counter. The document itself has an autogenerated _id of type ObjectID.
Model
var ParticipationSchema = new Schema({
instruments: [{
name: String,
counter: Number
}],
// etc.
},
{
timestamps: true
});
I'd like now to change exactly 1 item within the instruments array, only if it matches the following requirements:
The document id has to equal 58f3c77d789330486ccadf40
The instruments-item's name which should be changed has to equal 'instrument-1'
The instrument-item's counter has to be lower than 3
Query
let someId = '58f3c77d789330486ccadf40';
let instrumentName = 'instrument-1'
let counter = 3;
Participation.update(
{
$and: [
{ _id: mongoose.Types.ObjectId(someId) },
{ 'instruments.name': instrumentName },
{ 'instruments.counter': { $lt: counter } }
]},
{
$set: {
'instruments.$.currentcounter' : counter
}
},
(err, raw) => {
// ERROR HANDLING
}
});
Let's assume I have 3 entries within the instruments-attribute:
"instruments": [
{
"name": "instrument-1",
"counter": 2
},
{
"name": "instrument-1",
"counter": 2
},
{
"name": "instrument-1",
"counter": 2
}
]
Desired behaviour: change the first element's counter attribute to 3, no matter, when running the update code 1 time, do no action when running it more times.
Actual behaviour:
it changes the 1st element's counter attribute to 3 when running it the 1st time
it changes the 2nds element's counter attribute to 3 when running it the 2nd time
it changes the 3rds element's counter attribute to 3 when running it the 3rd time
Although the queries are anded, they don't seem to run element-wise. To resolve the issue, $elemMatch can be used:
Participation.update({
_id: mongoose.Types.ObjectId(someId),
'instruments': {
$elemMatch: {
'name': instrumentName,
'counter': { $lt: counter } }}
},
// etc.
more info:
API reference on $elemMatch
Thanks to #VEERAM's for pointing out that this kind of behaviour is also documented the mongodb homepage.

Count how many and which index of a array

I have an array of objects:
result = [
{ _id: 53d0dfe3c42047c81386df9d, video_id: '1' },
{ _id: 53d0dfe3c42047c81386df9e, video_id: '1' },
{ _id: 53d0dfe3c42047c81386df9f, video_id: '1' },
{ _id: 53d0dfe3c42047c81386dfa0, video_id: '2' },
{ _id: 53d0dfe3c42047c81386dfa1, video_id: '2' },
{ _id: 53d0dfe3c42047c81386dfa2, video_id: '1' },
{ _id: 53d0dfe3c42047c81386dfa3, video_id: '2' },
{ _id: 53d0dfe3c42047c81386dfa4, video_id: '1' }
]
I need to create another array, which takes video_id as the index, and contains how many times this video_id appears in the first array:
list = [
{'1' : 5},
{'2' : 4}
]
Currently, I use this code:
while (i < result.length)
{
if(list[result[i].video_id] === undefined) {
list[result[i].video_id] = 0;
}
list[result[i].video_id] = list[result[i].video_id] + 1;
i = i + 1;
}
It works, but I wonder if there is any faster and cleaner way to do so? (the real result array has over 10k elements, and I doubt >10k conditional statements are optimal...).
I am using node.js, result is from a mongoose (mongoDB) query, and I didn't see any way to get this done by mongoose itself:
var now = new Date();
//M_logs is a mongoose model
query = M_logs.where('time').gt(new Date(now.getFullYear(), 0, 1).getTime() / 1000).lt(now.getTime() / 1000).select('video_id');
(PS: I wonder if this isn't more a Code Review question, please tell me if I am off-topic so I can migrate the question).
EDIT:
To answer to Juan Carlos Farah:
S_logs = new mongoose.Schema({
user_ip : String,
user_id : String,
user_agent : String,
canal_id : String,
theme_id : String,
video_id : String,
osef : String,
time : Number,
action: String,
is_newuser : String,
operator : String,
template : String,
catalogue : String,
referer : String,
from : String,
osef1 : String
});
M_logs = mongoose.model('logs', S_logs);
You can do this using the aggregation framework. The idea is to do something as follows:
Match the documents you are looking for. Based on your current query, I understand it would be documents where time is between new Date(now.getFullYear(), 0, 1).getTime() / 1000 and now.getTime() / 1000.
Group the matched documents by video_id and keep track of their count.
Optionally sort by _id, which would be equivalent to the original video_id.
The following is in mongo shell syntax:
var now = new Date();
db.M_logs.aggregate([
{
"$match" : {
"time" : {
"$gt" : new Date(now.getFullYear(), 0, 1).getTime() / 1000,
"$lt" : now.getTime() / 1000
}
}
},
{
"$group" : {
"_id" : "$video_id",
"count" : { "$sum" : 1 }
}
},
{
"$sort" : { "_id" : 1 }
}
]);
If this works for you, you can easily implement it in Mongoose or Node.js driver syntax. Note that the aggregation framework returns a cursor, which you can iterate through to populate your array.
EDIT:
Using the Node.js driver, you can access the results from the aggregation query in the callback function. Something as follows:
...
, function(err, result) {
console.dir(result);
db.close();
}
Note that the Mongoose syntax for aggregation queries is slightly different.
Example:
Model.aggregate([ <QUERY> ]).exec( <CALLBACK> );
For more information, consult the documentation here.
I would suggest that you use aggregation framework to count number of documents. It will be significantly faster than iterating all your documents and counting them.
Using mongoose you can do it like this:
var now = new Date();
var startTime = new Date(now.getFullYear(), 0, 1).getTime() / 1000):
var endTime = now.getTime() / 1000;
M_logs.aggregate([
// filter the documents you're looking for
{"$match" : { "time" : {"$gt": startTime, "$lt": endTime}}},
// group by to get the count for each video_id
{"$group" : {"_id" : "$video_id", "count" : {"$sum" : 1}}},
// make the output more explanatory; this part is optional
{"$project" : { "video_id" : "$_id", "count" : "$count", _id : 0}}
]).exec(function(err, docs){
if (err) console.err(err);
console.log(docs);
});
The output of the docs will be:
[ { count: 4, video_id: '2' }, { count: 5, video_id: '1' } ]
use
var list = {};
result.forEach(function (el) {
list[el.video_id] = (list[el.video_id] || 0) + 1;
});
the resuling list will look something like this:
var list = {
'1': 5,
'2': 4
};

Getting subdocument element's count per index inside an array and updating the subdocument key - subdocument in array(IN MONGODB)

How to get subdocument element's count inside an array and how to update the subdocument's key in MongoDB
For eg, following is the whole doc stored in mongodb:
{
"CompanyCode" : "SNBN",
"EventCode" : "ET00008352",
"EventName" : "Sunburn Presents Avicii India Tour",
"TktDetail" : [
{
"Type" : "Category I",
"Qty" : {
"10-Dec" : {
"value" : 58
},
"11-Dec" : {
"value" : 83
},
"12-Dec" : {
"value" : 100
}
}
},
{
"Type" : "Category II",
"Qty" : {
"10-Dec" : {
"value" : 4
},
"11-Dec" : {
"value" : 7
},
"12-Dec" : {
"value" : 8
}
}
},
{
"Type" : "PRICE LEVEL 1",
"Qty" : {
"11-Dec" : {
"value" : 2
}
}
},
{
"Type" : "CatIV",
"Qty" : {
"18-Dec" : {
"value" : 20
}
}
}
],
"TransDate" : [
"10-Dec-2013",
"11-Dec-2013",
"12-Dec-2013",
],
"VenueCode" : "SNBN",
"VenueName" : "Sunburn",
"_id" : ObjectId("52452db273b92012c41ad612")
}
Here TktDetail is an array, inside which there is a Qty subdoc which contains multiple elements, I want to know how to get the elements count inside Qty per index?
For example, the 0th index of TktDetail array contains 1 Qty subdoc, which further has a element count of 3, whereas 3rd index has element count of 1 in Qty subdoc.
If I want to update the subdoc key, like, I want to update the date in Qty from "10-Dec" to "10-Dec-2013", how is it possible?
Thanks in advance, looking for a reply ASAP..
So the first thing here is that you actually asked two questions, being "how do I get a count of the items under Qty?" and "how can I change the names?". Now while normally unrelated I'm going to treat them as the same thing.
What you need to do is change your schema and in doing so I'm going to allow you to get the count of items and I'm going to encourage you to change those field names as well. Specifically you need a schema like this:
"TktDetail" : [
{
"Type" : "Category I",
"Qty" : [
{ "date": ISODate("2013-12-10T00:00:00.000Z") , "value" : 58 },
{ "date": ISODate("2013-12-11T00:00:00.000Z"), "value" : 83 },
{ "date": ISODate("2013-12-01T00:00:00.000Z"), "value" : 100 },
]
},
All the gory details are in my answer here to a similar question. But the problem basically is that when you use sub-documents in the way you have you are ruining your chances of doing any meaningful query operations on this, as to get at each element you must specify the full path to get there.
That answer has more detail, but the case is you really want an array. The trade-off, it's a little harder to update, especially considering you have nested arrays, but it's a lot easier to add and much easier to query.
Also, and related, change your dates to be dates and not strings. The strings, are no good for comparisons inside MongoDB. With them set as proper BSON dates (noting I clipped them to the start of day) you can compare, and query ranges and do useful things. Your application code will be happy to as the driver will return a real date object, rather than something you have to manipulate "both ways".
So once you have read through, understood and implemented this, on to counting:
db.collection.aggregate([
// Unwind the TktDetail array to de-normalize
{"$unwind": "$TktDetail"},
// Also Unwind the Qty array
{"$unwind": "$Qty" },
// Get some group information and count the entries
{"$group": {
"_id": {
"_id": "$_id,
"EventCode": "$EventCode",
"Type": "$TktDetail.Type"
},
"Qty": {"$sum": 1 }
}},
// Project nicely
{"$project": {
"_id": 0,
"EventCode": "$_id.EventCode",
"Type: "$_id.Type",
"Qty": 1,
}},
// Let's even sort it
{"$sort": { "EventCode": 1, "Qty" -1 }}
])
So that allowed us to get a count of the items in Qty for each EventCode by Type with the Qty ordered higest to lowest.
And that is not possible on your current schema without loading and traversing each document in code.
So there is the case. Now if you want to ignore this and just go about changing the sub-document key names, then you'll need to do remove the key and underlying document and replace with the new key name, using update:
db.collection.update(
{ EventCode: "ET00008352"},
{ $unset:{ "TktDetail.0.Qty.10-Dec": "" }}
)
db.collection.update(
{ EventCode: "ET00008352"},
{ $set:{ "TktDetail.0.Qty.10-Dec-2013": { value: 58 } }}
)
And you'll need to do that for every item that you have.
So you either work out that schema conversion or otherwise have a lot of work anyway in order to change the keys. For myself, I'd do it properly, and only do it once so I didn't run into the next problem later.

How can i find object from an array in mongoose

I have a schema like following : -
var P = {
s : [{
data : [],
sec : mongoose.Schema.Types.ObjectId
}]
};
Now I want to find only the object of section not entire the row. Like If I pass sec value I want only the value of s.data of that sec object.
example : -
{ s : [
{
data : [],
sec : '52b9830cadaa9d273200000d'
},{
data : [],
sec : '52b9830cadaa9d2732000005'
}
]
}
Result should be look like -
{
data : [],
sec : '52b9830cadaa9d2732000005'
}
I do not want all entire row. Is it possible? If yes, then please help me.
You can use the $elemMatch projection operator to limit an array field like s to a single, matched element, but you can't remove the s level of your document using find.
db.test.find({}, {_id: 0, s: {$elemMatch: {sec: '52b9830cadaa9d2732000005'}}})
outputs:
{
"s": [
{
"data": [ ],
"sec": "52b9830cadaa9d2732000005"
}
]
}
You can always get the value of some field by using find(). For example in your case:
db.collectionName.find({},{s.data:1})
So the first bracket is to apply any condition or query, and in the second bracket you have to define the field as 1(to fetch only those fields value).
Please check http://docs.mongodb.org/manual/reference/method/db.collection.find for more information.
Let me know if it solves your problem.
Not into Mongo or db but working with Pure JavaSript skills here is the Solution as you mentioned Node.js which would do the execution task of the below.
Schema
var P = { s : [
{
data : [],
sec : '52b9830cadaa9d273200000d'
},{
data : [],
sec : '52b9830cadaa9d2732000005'
}
]
};
Search Method Code
var search = function (search_sec){
for (var i=0; i<(P.s.length);i++){
var pointer = P.s[i].sec;
var dataRow = P.s[i];
if((pointer) === search_sec ){
console.log(dataRow);
}
}
};
Here is How you can call - search('search_id');
For example input :
search('52b9830cadaa9d2732000005');
Output:
[object Object] {
data: [],
sec: "52b9830cadaa9d2732000005"
}
Working Demo here - http://jsbin.com/UcobuVOf/1/watch?js,console

mongodb - adding the value in a field to the value in an embedded array

I have a document in MongoDB as below.
{
"CorePrice" : 1,
"_id" : 166,
"partno" : 76,
"parttype" : "qpnm",
"shipping" :
[
{
"shippingMethod1" : "ground",
"cost1" : "10"
},
{
"shippingMethod2" : "air",
"cost2" : "11"
},
{
"shippingMethod3" : "USPS",
"cost3" : "3"
},
{
"shippingMethod4" : "USPS",
"cost4" : 45
}
]
}
My goal is to add CorePrice (1) to cost4 (45) and retrieve the computed value as a new column "dpv". I tried using the below query. However I receive an error exception: $add only supports numeric or date types, not Array. I'm not sure why. Any kind of help will be greatly appreciated.
db.Parts.aggregate([
{
$project: {
partno: 1,
parttype: 1,
dpv: {$add: ["$CorePrice","$shipping.cost1"]}
}
},
{
$match: {"_id":{$lt:5}}
}
]);
When you refer to the field shipping.cost1 and shipping is an array, MongoDB does not know which entry of the shipping-array you are referring to. In your case there is only one entry in the array with a field cost1, but this can't be guaranteed. That's why you get an error.
When you are able to change your database schema, I would recommend you to turn shipping into an object with a field for each shipping-type. This would allow you to address these better. When this is impossible or would break some other use-case, you could try to access the array entry by numeric index (shipping.0.cost1).
Another thing you could try is to use the $sum-operator to create the sum of all shipping.cost1 fields. When there is only one element in the array with a field cost1, the result will be its value.
I am able to achieve this by divorcing the query into two as below.
var pipeline1 = [
{
"$unwind": "$shipping"
},
{
$project:{
partno:1,
parttype:1,
dpv:{
$add:["$CorePrice","$shipping.cost4"]
}
}
},
{
$match:{"_id":5}
}
];
R = db.tb.aggregate( pipeline );

Resources