MongoDB - Query with operations against array parameter elements - arrays

I’m new to MongoDB and I’m trying to get my head around whether I can perform this query conveniently and with decent performance using MongoDB. I’d like to pass numeric and array parameters to a query and use them to perform element by element operations on array values in each document in the collection. Is this possible?
e.g. A collection contains documents like the following:
{
"name" : "item1",
"m" : 5.2,
"v" : 1.1,
"data1" : [ 0, 0, 0.3, 0.7, 0.95, 0.9, 0.75, 0.4, 0.1, 0 ],
"data2" : [ 0, -1, 0, 1, 1, 0 ]
}
And I have another "search" document that might look something like this:
{
"x" : 8,
"K" : 1,
"dataA" : [ 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5],
"dataB" : [ 0, -2, 0, 1, 1, 0 ]
}
I want to run a query, or map-reduce, using the above search document against the collection above that returns a collection containing:
{
"name",
"y" = fn(m, v, x, K) = Kvx^(1/m) (not the real formula but just an example)
"dataF" = Max(i=0..9) {data1[i] * dataA[i] }
"dataS" = Sum(j=0..5) {data2[j] * dataB[j] }
}
where y>0
So for the above example, the result returned would be
{
"name" : "item1",
"y" : 1 * 1.1 * 8^5.2 = 1.641
"dataF" : Max(..., 0.4*0.5, 0.1*1, 0 * 0.5 ) = 0.2
"dataS" : 0*0 + (-1)*(-2) + 0*0 + 1*1 + 1*1 + 0*0 = 4
}
Is this going to be possible/convenient using MongoDB?
Note: In my application there will be more standard criteria included in the search using standard MongoDB operations so I was hoping to include the above processing in the query and avoid doing it on the client.

Here's a map/reduce version:
db.data.save({
"name" : "item1",
"m" : 5.2,
"v" : 1.1,
"data1" : [ 0, 0, 0.3, 0.7, 0.95, 0.9, 0.75, 0.4, 0.1, 0 ],
"data2" : [ 0, -1, 0, 1, 1, 0 ]
});
db.data.mapReduce( function() {
var searchdoc = {
"x" : 8,
"K" : 1,
"dataA" : [ 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5],
"dataB" : [ 0, -2, 0, 1, 1, 0 ]
};
var result = {name: this.name};
result.y = searchdoc.K * this.v * Math.pow(searchdoc.x, 1 / this.m);
if(result.y > 0) {
result.dataF = 0;
for(i=0;i<this.data1.length;i++) {
var f = this.data1[i] * searchdoc.dataA[i];
if(f > result.dataF) {
result.dataF = f;
}
}
result.dataS = 0;
for(i=0;i<this.data2.length;i++) {
var s = this.data2[i] * searchdoc.dataB[i];
result.dataS += s;
}
emit(this.name, result);
}
}, function(key, values){}, {out: {inline: 1}});
result:
{
"results" : [
{
"_id" : "item1",
"value" : {
"name" : "item1",
"y" : 1.640830939540542,
"dataF" : 0.2,
"dataS" : 4
}
}
],
"timeMillis" : 0,
"counts" : {
"input" : 1,
"emit" : 1,
"reduce" : 0,
"output" : 1
},
"ok" : 1,
}
This is the shell version:
db.data.save({
"name" : "item1",
"m" : 5.2,
"v" : 1.1,
"data1" : [ 0, 0, 0.3, 0.7, 0.95, 0.9, 0.75, 0.4, 0.1, 0 ],
"data2" : [ 0, -1, 0, 1, 1, 0 ]
});
var searchdoc = {
"x" : 8,
"K" : 1,
"dataA" : [ 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5],
"dataB" : [ 0, -2, 0, 1, 1, 0 ]
};
var search = function(searchdoc) {
db.data.find().forEach(function(obj) {
var result = {name:obj.name};
result.y = searchdoc.K * obj.v * Math.pow(searchdoc.x, 1 / obj.m);
if( result.y > 0 ) {
result.dataF = 0;
for(i=0;i<obj.data1.length;i++) {
var f = obj.data1[i] * searchdoc.dataA[i];
if(f > result.dataF) {
result.dataF = f;
}
}
result.dataS = 0;
for(i=0;i<obj.data2.length;i++) {
var s = obj.data2[i] * searchdoc.dataB[i];
result.dataS += s;
}
db.results.save(result);
}
});
}
search(searchdoc);
db.results.find();
{ "_id" : ObjectId("4f08ffe4264d23670eeaaadf"), "name" : "item1", "y" : 1.640830939540542, "dataF" : 0.2, "dataS" : 4 }

Related

Sorting and ranking matching elements in a nested array list in MongoDB

Greetings fellow fans of MongoDB!
I've got here a data structure with board game data where achieved scores (after every round) are tracked as nested arrays associated with the player's name. Note that with each board game there's a different set of players:
{
"BoardGames" : {
"Game1" : {
"players" : {
"Anne" : [97, 165, 101, 67],
"Pete" : [86, 115, 134, 149],
"Scott" : [66, 89, 103, 74],
"Jane" : [113, 144, 125, 99],
"Katie" : [127, 108, 98, 151]
}
},
"Game2" : {
"players" : {
"Margot" : [1, 0, 0, 0],
"Pete" : [0, 0, 1, 1],
"Michael" : [0, 0, 0, 0],
"Jane" : [0, 0, 1, 0]
}
},
"Game3" : {
"players" : {
"Chris" : [6, 2, 4, 0, 5, 7],
"Pete" : [4, 5, 2, 5, 3, 1, 4],
"Julia" : [3, 7, 4, 0],
"Tom" : [3, 2, 4, 8, 2, 6, 7]
}
},
}
Game1: Players earn as many victory points per round as they can
Game2: Winning around earns 1, losing a round 0
Game3: Players may leave after every round, hence some players have played more rounds than others, so these arrays are different in their length
So, here are my questions:
Which player got the most points in each game? Who the least?
Who is the winner in the first round? 2nd round, etc.
Who is sitting on 1st, 2nd and 3rd rank from all played games?
I've done quite some queries with mongo, but so far with a nested array that is attached to a flexible/unpredictable parent node I have no idea how to write a query. Also, maybe this is not the best way how I structured the data. So in case you have a better idea, I'd be happy to learn!
Cheers!
P.S.: The insertMany statement to above JSON data:
db.boardGames.insertMany([
{
"_id" : 1,
"Game1" : {
"players" : {
"Anne" : [97, 165, 101, 67],
"Pete" : [86, 115, 134, 149],
"Scott" : [66, 89, 103, 74],
"Jane" : [113, 144, 125, 99],
"Katie" : [127, 108, 98, 151]
}
},
"Game2" : {
"players" : {
"Margot" : [1, 0, 0, 0],
"Pete" : [0, 0, 1, 1],
"Michael" : [0, 0, 0, 0],
"Jane" : [0, 0, 1, 0]
}
},
"Game3" : {
"players" : {
"Chris" : [6, 2, 4, 0, 5, 7],
"Pete" : [4, 5, 2, 5, 3, 1, 4],
"Julia" : [3, 7, 4, 0],
"Tom" : [3, 2, 4, 8, 2, 6, 7]
}
}
}]);
the schema you have is not ideal. if it was something like this: https://mongoplayground.net/p/o8m205t9UKG then you can query like the following:
find winner of each game:
db.collection.aggregate(
[
{
$set: {
Players: {
$map: {
input: "$Players",
as: "x",
in: {
Name: "$$x.Name",
TotalScore: { $sum: "$$x.Scores" }
}
}
}
}
},
{
$unwind: "$Players"
},
{
$sort: { "Players.TotalScore": -1 }
},
{
$group: {
_id: "$Name",
Winner: { $first: "$Players.Name" }
}
}
])
find top 3 ranking players across all games:
db.collection.aggregate(
[
{
$set: {
Players: {
$map: {
input: "$Players",
as: "x",
in: {
Name: "$$x.Name",
TotalScore: { $sum: "$$x.Scores" }
}
}
}
}
},
{
$unwind: "$Players"
},
{
$group: {
_id: "$Players.Name",
TotalScore: { $sum: "$Players.TotalScore" }
}
},
{
$sort: { TotalScore: -1 }
},
{
$limit: 3
},
{
$group: {
_id: null,
TopRanks: { $push: "$_id" }
}
},
{
$project: {
_id: 0,
TopRanks: 1
}
}
])
find the winner of each round across all games
db.collection.aggregate(
[
{
$set: {
"Players": {
$map: {
input: "$Players",
as: "p",
in: {
Scores: {
$map: {
input: "$$p.Scores",
as: "s",
in: {
Player: "$$p.Name",
Round: { $add: [{ $indexOfArray: ["$$p.Scores", "$$s"] }, 1] },
Score: "$$s"
}
}
}
}
}
}
}
},
{
$unwind: "$Players"
},
{
$unwind: "$Players.Scores"
},
{
$replaceRoot: { newRoot: "$Players.Scores" }
},
{
$sort: {
Round: 1,
Score: -1
}
},
{
$group: {
_id: "$Round",
Winner: { $first: "$Player" }
}
},
{
$project: {
_id: 0,
Round: "$_id",
Winner: 1
}
},
{
$sort: {
Round: 1
}
}
])

remove key of matched number in [String:[NSNumber]]

How to remove a value from dictionary with array of values. I have response from server like below:
[
"business_proof":[
0,
0,
0,
0,
0,
0,
0,
0,
-1,// business_proof contains -1, I want to remove this key like wise any other contains
0,
0
],
"reference_proof":[
1,
2,
1
],
"vehicle_proof":[
1,
1,
2
],
"previous_loan_track":[
2,
2,
0,
0,
2,
2
],
"banking_proof":[
1,
1
],
"income_proof":[
0,
0,
2,
0,
2,
1,
2,
0,
0
],
"signature_proof":[
2,
2,
1,
2,
2,
2
],
"employment_proof":[
2,
1,
2,
2,
2,
2,
2
],
"guarantor_proof":[
1,
2,
2
],
"pdc_proof":[
1,
0
],
"address_proof":[
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
3
],
"age_proof":[
2,
2,
2,
2,
2,
2,
1,
2
],
"contact_proof":[
0,
2,
2
],
"photo_id_proof":[
2,
2,
2,
2,
2,
2,
2,
2,
2,
2
]
]
Second reponse
[
"signature_proof":[
"pan_card",
"driving_licence",
"accepted_documents",
"passport",
"cancelled_cheque",
"bank_report"
],
"guarantor_proof":[
"accepted_documents",
"co_applicant",
"guarantor"
],
"previous_loan_track":[
"housing_loan",
"vehicle_loan",
"over_draft_limit",
"accepted_documents",
"business_loan",
"personal_loan"
],
"address_proof":[
"bank_statement",
"voter_id",
"rental_agreement",
"eb_bill",
"registration_document",
"hr_letter",
"driving_licence",
"property_tax_receipt",
"telephone_bill",
"cc_statement",
"gas_bill",
"aadhaar_card",
"passport",
"ration_card",
"accepted_documents"
],
"vehicle_proof":[
"vehi_insurance",
"vehi_rc",
"accepted_documents"
],
"business_proof":[
"business_commencement_certificate",
"ssi_msme_certificate",
"business_transactions",
"mou",
"aoa",
"gst_no",
"tan_no",
"business_agreements",
"accepted_documents",
"shop_and_establishment_certificate",
"incorporation_certificate"
],
"banking_proof":[
"bank_statement",
"accepted_documents"
],
"income_proof":[
"form_16",
"profit_loss_statement",
"rental_income_proof",
"payslip",
"income_in_cash_proof",
"accepted_documents",
"brokerage_income_proof",
"it_returns",
"audited_balance_sheet"
],
"reference_proof":[
"ref2",
"accepted_documents",
"ref1"
],
"employment_proof":[
"employee_id_card",
"accepted_documents",
"payslip",
"relieving_letter",
"comp_app_letter",
"hr_letter",
"epf_no_uan_no"
],
"age_proof":[
"employee_id_card",
"ration_card",
"pan_card",
"passport",
"voter_id",
"school_certificate",
"accepted_documents",
"aadhaar_card"
],
"contact_proof":[
"accepted_documents",
"landline_bill",
"mobile_bill"
],
"photo_id_proof":[
"employee_id_card",
"nrega_card",
"ration_card",
"bank_passbook",
"pan_card",
"passport",
"voter_id",
"driving_licence",
"accepted_documents",
"aadhaar_card"
],
"pdc_proof":[
"cheque_book",
"accepted_documents"
]
]
Both are array of dictionary, Both will have same key only. I know this structure is completely wrong.
business_proof only contains - 1, so i want to remove both places.
Here i need to remove key and values if anyone key values contains -1.
I am trying like this but it shows compiler error
finalValueArray.removeAll(where: { $0.contains(-1) })
You can use filter as far as I understand
var filteredItems = object.filter { !$0.value.contains(-1)}
You can get all elements which contain -1
var minusOneItems = object.filter { $0.value.contains(-1)}
and than
for negativeItem in minusOneItems {
object.removeValue(forKey: negativeItem.key)
}
It depends what you need.
You can simply use a combination of forEach and contains on the dictionary like so,
var dictionary = ["business_proof": [0, 0, 1, -1, 2, -1], "reference_proof": [1, 2, 1], "vehicle_proof": [-1, 0, 0, 2]]
dictionary.forEach { (key,value) in
dictionary[key] = value.contains(-1) ? nil : value
}
print(dictionary) //["reference_proof": [1, 2, 1]]
Or you can simply apply filter on dictionary like,
dictionary = dictionary.filter({ !$0.value.contains(-1) })
print(dictionary) //["reference_proof": [1, 2, 1]]
You can filter your dictionary to remove the entries that contains -1 in their value.
let filteredArrayOnDict = dataDict.filter { value.contains{ $0 != -1 } }
The filteredArrayOnDict is the array of tuples. Now if you want to create a dictionary from it. You can do this way:
let filteredDictionary = filteredArrayOnDict.reduce(into: [:]) { $0[$1.0] = $1.1}
Now you have the only the entries in filteredDictionary that doesn't have -1 in their value.
You can use while loop to remove all key-value pairs which contain -1 from both dictionaries.
var dict1 = ["business_proof":[0,0,0,0,0,0,0,0,-1,0,0],"reference_proof":[1,2,1],"vehicle_proof":[1,1,2],"previous_loan_track":[2,2,0,0,2,2],"banking_proof":[1,1],"income_proof":[0,0,2,0,2,1,2,0,0],"signature_proof":[2,2,1,2,2,2],"employment_proof":[2,1,2,2,2,2,2],"guarantor_proof":[1,2,2],"pdc_proof":[1,0],"address_proof":[2,2,2,2,2,2,2,2,2,2,2,2,2,2,3],"age_proof":[2,2,2,2,2,2,1,2],"contact_proof":[0,2,2],"photo_id_proof":[2,2,2,2,2,2,2,2,2,2]]
var dict2 = ["signature_proof":["pan_card","driving_licence","accepted_documents","passport","cancelled_cheque","bank_report"],"guarantor_proof":["accepted_documents","co_applicant","guarantor"],"previous_loan_track":["housing_loan","vehicle_loan","over_draft_limit","accepted_documents","business_loan","personal_loan"],"address_proof":["bank_statement","voter_id","rental_agreement","eb_bill","registration_document","hr_letter","driving_licence","property_tax_receipt","telephone_bill","cc_statement","gas_bill","aadhaar_card","passport","ration_card","accepted_documents"],"vehicle_proof":["vehi_insurance","vehi_rc","accepted_documents"],"business_proof":["business_commencement_certificate","ssi_msme_certificate","business_transactions","mou","aoa","gst_no","tan_no","business_agreements","accepted_documents","shop_and_establishment_certificate","incorporation_certificate"],"banking_proof":["bank_statement","accepted_documents"],"income_proof":["form_16","profit_loss_statement","rental_income_proof","payslip","income_in_cash_proof","accepted_documents","brokerage_income_proof","it_returns","audited_balance_sheet"],"reference_proof":["ref2","accepted_documents","ref1"],"employment_proof":["employee_id_card","accepted_documents","payslip","relieving_letter","comp_app_letter","hr_letter","epf_no_uan_no"],"age_proof":["employee_id_card","ration_card","pan_card","passport","voter_id","school_certificate","accepted_documents","aadhaar_card"],"contact_proof":["accepted_documents","landline_bill","mobile_bill"],"photo_id_proof":["employee_id_card","nrega_card","ration_card","bank_passbook","pan_card","passport","voter_id","driving_licence","accepted_documents","aadhaar_card"],"pdc_proof":["cheque_book","accepted_documents"]]
while let invalid = dict1.first(where: { $0.value.contains(-1) }) {
dict1.removeValue(forKey: invalid.key)
dict2.removeValue(forKey: invalid.key)
}
print(dict1)//["reference_proof":[1,2,1],"vehicle_proof":[1,1,2],"previous_loan_track":[2,2,0,0,2,2],"banking_proof":[1,1],"income_proof":[0,0,2,0,2,1,2,0,0],"signature_proof":[2,2,1,2,2,2],"employment_proof":[2,1,2,2,2,2,2],"guarantor_proof":[1,2,2],"pdc_proof":[1,0],"address_proof":[2,2,2,2,2,2,2,2,2,2,2,2,2,2,3],"age_proof":[2,2,2,2,2,2,1,2],"contact_proof":[0,2,2],"photo_id_proof":[2,2,2,2,2,2,2,2,2,2]]
print(dict2)//["signature_proof":["pan_card","driving_licence","accepted_documents","passport","cancelled_cheque","bank_report"],"guarantor_proof":["accepted_documents","co_applicant","guarantor"],"previous_loan_track":["housing_loan","vehicle_loan","over_draft_limit","accepted_documents","business_loan","personal_loan"],"address_proof":["bank_statement","voter_id","rental_agreement","eb_bill","registration_document","hr_letter","driving_licence","property_tax_receipt","telephone_bill","cc_statement","gas_bill","aadhaar_card","passport","ration_card","accepted_documents"],"vehicle_proof":["vehi_insurance","vehi_rc","accepted_documents"],"banking_proof":["bank_statement","accepted_documents"],"income_proof":["form_16","profit_loss_statement","rental_income_proof","payslip","income_in_cash_proof","accepted_documents","brokerage_income_proof","it_returns","audited_balance_sheet"],"reference_proof":["ref2","accepted_documents","ref1"],"employment_proof":["employee_id_card","accepted_documents","payslip","relieving_letter","comp_app_letter","hr_letter","epf_no_uan_no"],"age_proof":["employee_id_card","ration_card","pan_card","passport","voter_id","school_certificate","accepted_documents","aadhaar_card"],"contact_proof":["accepted_documents","landline_bill","mobile_bill"],"photo_id_proof":["employee_id_card","nrega_card","ration_card","bank_passbook","pan_card","passport","voter_id","driving_licence","accepted_documents","aadhaar_card"],"pdc_proof":["cheque_book","accepted_documents"]]
Create a struct and initialize the struct objects with both dictionary values. Then store the struct objects in an array. Now you can filter the array by its values
struct Proof {
var title: String
var arr: [Int]
var documents: [String]
}
var proofs = [Proof]()
dict1.forEach {
if let docs = dict2[$0.key] {
proofs.append(Proof(title: $0.key, arr: $0.value, documents: docs))
}
}
print(proofs)
let validProofs = proofs.filter { !$0.arr.contains(-1) }
print(validProofs)
Enumerate the index dictionary, look for occurrences of -1 and filter the indices. Then reverse the found indices and remove the items in both arrays. The code considers that both dictionaries are value types
var indexDict = ["business_proof":[0,0,0,0,0,0,0,0,-1,0,0] ...
var valueDict = ["signature_proof":["pan_card","driving_licence","accepted_documents","passport","cancelled_cheque","bank_report"] ...
for (key, value) in indexDict {
let foundIndices = value.indices.filter({value[$0] == -1})
for index in foundIndices.reversed() {
indexDict[key]!.remove(at: index)
valueDict[key]!.remove(at: index)
}
}

How to sum multidimensional arrays across documents with MongoDB

Say I have a lot of the following documents:
{
_id: “abc”,
values: {
0: { 0: 999999, 1: 999999, …, 59: 1000000 },
1: { 0: 2000000, 1: 2000000, …, 59: 1000000 },
…,
58: { 0: 1600000, 1: 1200000, …, 59: 1100000 },
59: { 0: 1300000, 1: 1400000, …, 59: 1500000 }
}
}
{
_id: “def”,
values: {
0: { 0: 999999, 1: 999999, …, 59: 1000000 },
1: { 0: 2000000, 1: 2000000, …, 59: 1000000 },
…,
58: { 0: 1600000, 1: 1200000, …, 59: 1100000 },
59: { 0: 1300000, 1: 1400000, …, 59: 1500000 }
}
}
which is basically a multidimensional array of 60x60 items.
can aggregation (or any other mongodb construct) be used to easily sum the two (or more) matrixes? i.e. values[x][y] of both abc and def are summed together, and the same is done for all other elements?
Ideally the output would be a similar multidimensional array.
This answer seems to suggest this is possible with 1 dimensional array but I am not sure for multidimensional.
EDIT:
This is an example with real data in a format which is slightly different:
db.col.find({}, { _id: 0, hit: 1 })
{ "hit" : [ [ 570, 0, 630, 630, 636, 735, 672, 615, 648, 648, 618, 0 ],
[ 492, 0, 471, 471, 570, 564, 0, 590, 513, 432, 471, 477 ],
[ 387, 0, 0, 0, 0, 0, 0, 456, 0, 480, 351, 415 ],
[ 432, 528, 0, 0, 495, 509, 0, 579, 0, 552, 0, 594 ],
[ 558, 603, 594, 624, 672, 0, 0, 705, 783, 0, 756, 816 ],
[ 0, 858, 951, 1027, 0, 0, 1058, 1131, 0, 0, 1260, 1260 ],
[ 1269, 0, 1287, 0, 1326, 0, 1386, 1386, 1470, 0, 0, 0 ],
[ 1623, 0, 1695, 1764, 1671, 1671, 0, 1824, 1872, 0, 0, 0 ],
[ 1950, 1894, 2034, 2034, 0, 0, 1941, 0, 2070, 1911, 2049, 2055 ],
[ 2052, 2052, 0, 0, 0, 2085, 2007, 2073, 0, 0, 0, 1941 ],
[ 1878, 1896, 0, 1875, 0, 0, 1677, 0, 1722, 0, 1545, 0 ],
[ 0, 0, 1317, 1469, 1501, 1634, 1494, 0, 0, 1290, 0, 0 ],
[ 0, 1485, 1375, 1491, 1530, 1407, 0, 0, 0, 1611, 0, 0 ],
[ 1652, 1800, 1686, 1643, 1923, 0, 0, 0, 1737, 1604, 1797, 0 ],
[ 1842, 1806, 0, 1830, 1896, 1947, 0, 1710, 1734, 1725, 0, 0 ],
[ 0, 0, 1932, 0, 1908, 1878, 1941, 1931, 2007, 2013, 1995, 1995 ],
[ 0, 2025, 2004, 1927, 0, 0, 1939, 1835, 1962, 1863, 0, 1815 ],
[ 0, 0, 1839, 1755, 1821, 1821, 1751, 1656, 0, 0, 1467, 0 ],
[ 0, 1632, 1546, 1449, 0, 1551, 1449, 0, 0, 1554, 0, 1491 ],
[ 1463, 1411, 0, 1491, 0, 0, 1551, 1467, 0, 0, 0, 1464 ],
[ 0, 0, 1311, 0, 0, 1471, 0, 0, 1581, 0, 1368, 1368 ],
[ 1296, 0, 0, 0, 1176, 1381, 0, 1170, 1194, 1194, 1193, 1137 ],
[ 0, 1244, 1221, 1039, 0, 1041, 930, 921, 1033, 813, 0, 0 ],
[ 0, 0, 0, 1010, 0, 0, 918, 783, 0, 609, 693, 645 ] ] }
And this is the appropriate query (thanks to Veeram in the comments for fixing my code):
db.col.aggregate([
{ $project: { _id: 0, hit: 1 } },
{ $unwind: { path: "$hit", includeArrayIndex: "x" } },
{ $unwind: { path: "$hit", includeArrayIndex: "y" } },
{ $group: { _id: { x: "$x", y: "$y" }, hit: { $sum: "$hit" } } },
{ $sort: { "_id.x": 1, "_id.y": 1 } },
{ $group: { _id: "$_id.x", hit: { $push: "$hit" } } },
{ $sort: { "_id": 1 } },
{ $group: { _id: null, hit: { $push: "$hit" } } }
])
You need two operators to deal with dynamic properties: $objectToArray and $arrayToObject. To sum the values from all documents you can try to represent each x,y pair as single document (using $unwind) and then use several $group stages to get single document as a result. To get the initial order of your rows and columns you can apply $sort twice:
db.col.aggregate([
{
$project: {
values: {
$map: {
input: { $objectToArray: "$values" },
as: "obj",
in: { k: "$$obj.k", v: { $objectToArray: "$$obj.v" } }
}
}
}
},
{
$unwind: "$values"
},
{
$unwind: "$values.v"
},
{
$project: {
x: "$values.k",
y: "$values.v.k",
value: "$values.v.v"
}
},
{
$group: {
_id: { x: "$x", y: "$y" },
value: { $sum: "$value" }
}
},
{
$sort: {
"_id.y": 1
}
},
{
$group: {
_id: "$_id.x",
v: { $push: { k: "$_id.y", v: "$value" } }
}
},
{
$sort: {
"_id": 1
}
},
{
$group: {
_id: null,
values: { $push: { k: "$_id", v: "$v" } }
}
},
{
$project: {
values: {
$arrayToObject: {
$map: {
input: "$values",
as: "obj",
in: {
k: "$$obj.k",
v: { $arrayToObject: "$$obj.v" }
}
}
}
}
}
}
])
For your sample data it outputs:
{
"_id" : null,
"values" : {
"0" : {
"0" : 1999998,
"1" : 1999998,
"59" : 2000000
},
"1" : {
"0" : 4000000,
"1" : 4000000,
"59" : 2000000
},
"58" : {
"0" : 3200000,
"1" : 2400000,
"59" : 2200000
},
"59" : {
"0" : 2600000,
"1" : 2800000,
"59" : 3000000
}
}
}

MongoError: exception: Cannot apply the positional operator without a corresponding query field containing an array

I am trying to update sub array in following document
{
"_id" : ObjectId("56b079d937fc13691b25c354"),
"clks" : 4,
"compl" : 0,
"crtd_on" : ISODate("2016-02-02T09:41:45.047Z"),
"default" : [],
"end_dt" : ISODate("2016-02-23T18:30:00.000Z"),
"fails" : 0,
"groups" : [
{
"N" : 100,
"alert_msg" : 0,
"clks" : 4,
"cmps" : 0,
"color" : 0,
"fls" : 0,
"grp_id" : 5453,
"grp_nm" : "Last Quota",
"oq" : 0,
"question_quota" : [
{
"q_id" : 96,
"q_key" : "AGE",
"q_cat" : 1,
"fls" : 0,
"cmps" : 0,
"opt_1" : {
"opt_id" : 1,
"startAge" : 18,
"endAge" : 35,
"N" : 100,
"fls" : 0,
"cmps" : 0
},
"opt_2" : {
"opt_id" : 2,
"startAge" : 40,
"endAge" : 60,
"N" : 100,
"fls" : 0,
"cmps" : 0
}
}
],
}
],
"grs_prf" : 0,
"grs_prf_per" : 0,
"id" : 5165,
"last_clk_dt" : ISODate("2016-02-02T11:38:00.078Z"),
"mod_on" : ISODate("2016-02-02T12:28:15.556Z"),
"n" : 100,
"nm" : "2Feb",
"oq" : 0,
"rev" : 0,
"rew" : 0,
"sortNm" : "2feb",
"st" : 1,
"str_dt" : ISODate("2016-02-01T18:30:00.000Z"),
"supp" : 0
}
I am trying to update with following query
db.getCollection('job_stats').update({ id: 5165, 'groups.grp_id': 5453, 'groups.question_quota.q_id': 96, 'groups.question_quota.opt_1.opt_id': 1 },{ '$set': { mod_on: new Date("Tue, 02 Feb 2016 13:12:26 GMT") }, '$inc': { 'groups.$.question_quota.$.opt_1.cmps': 1 } },{upsert: false})
But it gives me error:
Cannot apply the positional operator without a corresponding query field containing an array
I just want to increase groups.question_quota.opt_1.cmps this key from above document.

MongoDB: Why is my scannedObjects value high even all fields of the query are indexed?

I am indexing three fields on a collection, one of which is an array. I am running a query on these three fields and the query takes more than a second with 300K fields on the collection. When I call explain on the query, I see that my index is being used correctly, but the number of scannedObjects is very high. I guess this is the reason behind the low performance.
{
"_id" : ObjectId("54c8f110389a46153866d82e"),
"mmt" : [
"54944cfd90671810ccbf2552",
"54c64029038d8c3aff41ad6d",
"54c64029038d8c3aff41ad73",
"54c8f151038d8c3aff453669",
"54c8f151038d8c3aff45366d"
],
"p" : 8700,
"sui" : "3810d5cf-3032-4a77-9715-a42e010e569c"
/* also some more fields */
}
With this index:
{
"sui" : 1,
"p" : 1,
"mmt" : 1
}
I am trying to run this query:
db.my_coll.find(
{
"mmt" : { "$all" :
[
"54944cfd90671810ccbf2552", "54ac1db0e3f494afd4ded4c8", "54ac1db1e3f494afd4ded66a", "54ac1db1e3f494afd4ded66b", "54c8b671038d8c3aff453649", "54c8f154038d8c3aff45368f", "54c8f154038d8c3aff453694"
]
},
"sui" : { "$ne" : "bde0f517-b942-4823-b2c8-a41900f46641" },
"p": { $gt: 100, $lt: 1000 }
}
).limit(1000).explain()
The result of the explain is:
{
"cursor" : "BtreeCursor sui_1_p_1_mmt_1",
"isMultiKey" : true,
"n" : 16,
"nscannedObjects" : 14356,
"nscanned" : 129223,
"nscannedObjectsAllPlans" : 14356,
"nscannedAllPlans" : 129223,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1009,
"nChunkSkips" : 0,
"millis" : 1276,
"indexBounds" : {
"sui" : [
[
{
"$minElement" : 1
},
"bde0f517-b942-4823-b2c8-a41900f46641"
],
[
"bde0f517-b942-4823-b2c8-a41900f46641",
{
"$maxElement" : 1
}
]
],
"p" : [
[
-Infinity,
1000
]
],
"mmt" : [
[
"54944cfd90671810ccbf2552",
"54944cfd90671810ccbf2552"
]
]
},
"server" : "shopkrowdMongo:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 129224,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 16,
"needTime" : 129207,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 129224,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 16,
"needTime" : 129207,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 129224,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 16,
"needTime" : 129207,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 16,
"children" : [
{
"type" : "IXSCAN",
"works" : 129223,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 14356,
"needTime" : 114867,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ sui: 1.0, p: 1.0, mmt: 1.0 }",
"isMultiKey" : 1,
"boundsVerbose" : "field #0['sui']: [MinKey, \"bde0f517-b942-4823-b2c8-a41900f46641\"), (\"bde0f517-b942-4823-b2c8-a41900f46641\", MaxKey], field #1['p']: [-inf.0, 1000.0), field #2['mmt']: [\"54944cfd90671810ccbf2552\", \"54944cfd90671810ccbf2552\"]",
"yieldMovedCursor" : 0,
"dupsTested" : 14356,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 129223,
"children" : []
}
]
}
]
}
]
}
}
The number of items found is 16 but the number of scannedObjects is 14356. I do not understand why mongodb scans so much documents even though all the fields of the query are indexed.
Why is mongodb scanning so much objects?
How can I get the results of this query faster?
The mmt array I am using does not grow or shrink over time, but the number of elements in it varies between 5 - 15. I need to query this field with several combinations of $in, $all and $nin. Number of items in this collection will probably grow over 30M. Is there a way to reliably get fast results for this scenario?
UPDATE 1:
I tried removing sui field and the $ne query. The updated explain:
{
"cursor" : "BtreeCursor p_1_mmt_1",
"isMultiKey" : true,
"n" : 17,
"nscannedObjects" : 16338,
"nscanned" : 16963,
"nscannedObjectsAllPlans" : 16338,
"nscannedAllPlans" : 33930,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 265,
"nChunkSkips" : 0,
"millis" : 230,
"indexBounds" : {
"p" : [
[
-Infinity,
1000
]
],
"mmt" : [
[
"54944cfd90671810ccbf2552",
"54944cfd90671810ccbf2552"
]
]
},
"server" : "shopkrowdMongo:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 16966,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 17,
"needTime" : 16947,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 16966,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 17,
"needTime" : 16947,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 16965,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 17,
"needTime" : 16947,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 17,
"children" : [
{
"type" : "IXSCAN",
"works" : 16964,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 16338,
"needTime" : 626,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ p: 1.0, mmt: 1.0 }",
"isMultiKey" : 1,
"boundsVerbose" : "field #0['p']: [-inf.0, 1000.0), field #1['mmt']: [\"54944cfd90671810ccbf2552\", \"54944cfd90671810ccbf2552\"]",
"yieldMovedCursor" : 0,
"dupsTested" : 16338,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 16963,
"children" : []
}
]
}
]
}
]
}
}
The query performed better, but scannedObjects is still very high.
I think marcinn was right to single out the $ne as the most likely culprit, but update 1 shows us the $all is also a problem. The query is using the mmt portion of the index to find documents containing one of the values in the array and then must scan the rest of the mmt array to verify that all of the values in the $all array are in the mmt array of a potentially matching document. This means the potentially matching document must be loaded and scanned, so it counts as a scannedObject. To demonstrate this behavior very clearly, consider the following example:
> db.test.drop()
> for (var i = 0; i < 100; i++) db.test.insert({ "x" : [1, 2] })
> for (var i = 0; i < 100; i++) db.test.insert({ "x" : [1, 3] })
> db.test.ensureIndex({ "x" : 1 })
> db.test.find({ "x" : { "$all" : [1, 2] } }).explain(true)
This shows n = 100 and nscanned = nscannedObjects = 200 resulting from using the value 1 as both index bounds, while the logically equivalent query
> db.test.find({ "x" : { "$all" : [2, 1] } }).explain(true)
shows n = nscanned = nscannedObjects = 100 with both index bounds having the value 2.
Basically it is because $ne cannot use indexes (efficiently). So your index is used only because first you query by mnt field and then its reading
Some query operations are not selective. These operations cannot use
indexes effectively or cannot use indexes at all.
The inequality operators $nin and $ne are not very selective, as they
often match a large portion of the index. As a result, in most cases,
a $nin or $ne query with an index may perform no better than a $nin or
$ne query that must scan all documents in a collection
http://docs.mongodb.org/manual/core/query-optimization/

Resources