So I have some query to database (mongodb) which will order results by value field.
all := EValues{}
err := con.Find(bson.M{"name": "somename}).Sort("-value").All(&all)
Json output for this looks like:
"values": [
{
"user_name": "guest7485",
"value": 8911,
"value_date": "2016-03-09T14:40:34.512Z"
},
{
"user_name": "guest7485",
"value": 539,
"value_date": "2016-03-07T14:11:05.217Z"
},
{
"user_name": "guest7485",
"value": 221,
"value_date": "2016-03-07T14:11:08.853Z"
},
{
"user_name": "guest7485",
"value": 77,
"value_date": "2016-03-07T14:11:12.377Z"
}
]
In my json response I need to add parameter "position" which should be basically equal to 1 - first result, 2 - second result and so on, for all results. So my final output should be:
"values": [
{
"position": 1,
"user_name": "guest7485",
"value": 8911,
"value_date": "2016-03-09T14:40:34.512Z"
},
{
"position": 2,
"user_name": "guest7485",
"value": 539,
"value_date": "2016-03-07T14:11:05.217Z"
},
{
"position": 3,
"user_name": "guest7485",
"value": 221,
"value_date": "2016-03-07T14:11:08.853Z"
},
{
"position": 4,
"user_name": "guest7485",
"value": 77,
"value_date": "2016-03-07T14:11:12.377Z"
}
]
I'm wondering how to solve this with mgo and go in general, and I would be really greatfull if someone can give me the most efficient way to solve this.
Update:
Definition of Evalues is bellow:
type EValue struct {
ID bson.ObjectId `json:"-" bson:"_id,omitempty"`
Name string `json:"-" bson:"name"`
UserId bson.ObjectId `json:"-" bson:"userId"`
UserName string `json:"user_name" bson:"userName"`
Value int64 `json:"value" bson:"value"`
AddedTime time.Time `json:"value_date" bson:"addedTime"`
}
type EValues []EValue
Add a position field to EValue:
type EValue struct {
... other fields here
Position int `json:"position" bson:"-"`
}
Loop through db results and set the field:
for i := range all {
all[i].Position = i + 1
}
Marshal the result as JSON.
With MongDB 3.2 this can be done using the $unwind operator where you can pass an object with the field path and the field includeArrayIndex which will hold the array index:
pipeline = [
{ "$match": {"name": "somename"} },
{ "$unwind": { "path": "$values", "includeArrayIndex": "position" } },
{
"$project": {
"name": 1,
"newarray.position": "$position",
"newarray.user_name": "$values.user_name",
"newarray.value_date": "$values.value_date",
"newarray.value": "$values.value",
}
},
{
"$group": {
"_id": "$name",
"values": { "$push": "$newarray" }
}
}
]
db.test.aggregate(pipeline);
Output
> db.test.aggregate(pipeline).pretty();
{
"_id" : "somename",
"values" : [
{
"position" : NumberLong(0),
"user_name" : "guest8911",
"value_date" : "2016-03-09T14:40:34.512Z",
"value" : 8911
},
{
"position" : NumberLong(1),
"user_name" : "guest7485",
"value_date" : "2016-03-07T14:11:05.217Z",
"value" : 539
},
{
"position" : NumberLong(2),
"user_name" : "guest7485",
"value_date" : "2016-03-07T14:11:08.853Z",
"value" : 221
},
{
"position" : NumberLong(3),
"user_name" : "guest7485",
"value_date" : "2016-03-07T14:11:12.377Z",
"value" : 77
}
]
}
>
If this is not supported with the mgo driver, then a not so efficient approach would be to use Map-Reduce for this. The following mongo shell example demonstrates how you can run the operation:
Populate test collection:
db.test.insert({
"name": "somename",
"values": [
{
"user_name": "guest8911",
"value": 8911,
"value_date": "2016-03-09T14:40:34.512Z"
},
{
"user_name": "guest7485",
"value": 539,
"value_date": "2016-03-07T14:11:05.217Z"
},
{
"user_name": "guest7485",
"value": 221,
"value_date": "2016-03-07T14:11:08.853Z"
},
{
"user_name": "guest7485",
"value": 77,
"value_date": "2016-03-07T14:11:12.377Z"
}
]
})
Run the following map-reduce operation:
> mr = db.runCommand({
"mapreduce": "test",
"map": function() {
var arr = []
for(var i=0; i < this.values.length; i++){
var val = this.values[i];
val["position"] = i+1;
arr.push(val);
}
emit(this._id, arr);
},
"reduce" : function() {},
"out": "test_keys"
})
Query resulting collection:
> db[mr.result].find().pretty()
{
"_id" : ObjectId("56e18ab84b9018ec86d2a6bd"),
"value" : [
{
"user_name" : "guest8911",
"value" : 8911,
"value_date" : "2016-03-09T14:40:34.512Z",
"position" : 1
},
{
"user_name" : "guest7485",
"value" : 539,
"value_date" : "2016-03-07T14:11:05.217Z",
"position" : 2
},
{
"user_name" : "guest7485",
"value" : 221,
"value_date" : "2016-03-07T14:11:08.853Z",
"position" : 3
},
{
"user_name" : "guest7485",
"value" : 77,
"value_date" : "2016-03-07T14:11:12.377Z",
"position" : 4
}
]
}
>
Now given the listing above, you can assemble your query in mgo using MapReduce
job := mgo.MapReduce{
Map: "function(){var arr=[];for(var i=0;i<this.values.length; i++){var val=this.values[i];val['position']=i+1;arr.push(val);};emit(this._id,arr);}",
Reduce: "function() { }",
}
var result []struct { Id int "_id"; Value []EValue }
_, err := collection.Find(nil).MapReduce(job, &result)
if err != nil {
panic(err)
}
for _, item := range result {
fmt.Println(item.Value)
}
For more details, check the documentation: https://godoc.org/labix.org/v1/mgo#MapReduce:
Related
Player collection:
{ "_id" : 1, "Name" : "John Aims", "Gender" : "M", "DoB" : ISODate("1990-01-01T00:00:00Z"), "Nationality" : "USA", "Hand" : "R", "YearTurnedPro" : 2010, "Tournament" : [ { "tournamentID" : 1, "TournamentYear" : 2016 }, { "tournamentID" : 2, "TournamentYear" : 2019 }, { "tournamentID" : 3, "TournamentYear" : 2021 } ] }
{ "_id" : 2, "Name" : "George Brown", "Gender" : "M", "DoB" : ISODate("1997-03-04T00:00:00Z"), "Nationality" : "GB", "Hand" : "L", "YearTurnedPro" : 2013, "Tournament" : [ { "tournamentID" : 2, "TournamentYear" : 2016 }, { "tournamentID" : 5, "TournamentYear" : 2019 } ] }
Tournament collection:
{ "_id" : ObjectId("626c18a3d880647a888888ff"), "TournamentID" : 1, "TournamentCode" : "GS1", "Position" : 8, "PrizeMoney" : 125000, "RankingPoints" : 250 }
{ "_id" : ObjectId("626c18c2d880647a888888ff"), "TournamentID" : 2, "TournamentCode" : "GS1", "Position" : 4, "PrizeMoney" : 250000, "RankingPoints" : 500 }
{ "_id" : ObjectId("626c18ddd880647a888888ff"), "TournamentID" : 3, "TournamentCode" : "GS1", "Position" : 1, "PrizeMoney" : 1000000, "RankingPoints" : 2000 }
1st Question:
Hello, I want to get the sum of ranking points of each player.
I have tried:
db.Player.aggregate([
{"$unwind" : "$Tournament"},
{"$lookup":
{"from":"Tournament",
"localField":"Tournament.tournamentID",
"foreignField":"TournamentID",
"as":"Tennis-player"}},
{ "$group": {
"_id": { Name:"$Name" },
"total_qty": { "$sum": "$Tennis-player.PrizeMoney" }
}}
])
But I get for every played the sum is 0.
I can show it on playground as it is using more than 1 collection.
2nd question:
Would it be better to create only 1 collections with all the data?
$unwind
$lookup
$set - As from stage 2 Tennis-player returns an array with guarantee only 1 document in array. Use $first to get the first document in Tennis-player array field to become a document field.
$group
db.Player.aggregate([
{
"$unwind": "$Tournament"
},
{
"$lookup": {
"from": "Tournament",
"localField": "Tournament.tournamentID",
"foreignField": "TournamentID",
"as": "Tennis-player"
}
},
{
$set: {
"Tennis-player": {
"$first": "$Tennis-player"
}
}
},
{
"$group": {
"_id": {
Name: "$Name"
},
"total_qty": {
"$sum": "$Tennis-player.PrizeMoney"
}
}
}
])
Sample Mongo Playground
Alternative:
$lookup - Work $lookup with an Array
$project - Decorate output documents. Create total_qty field and use $reduce to perform sum operation of Tennic-player.PrizeMoney.
db.Player.aggregate([
{
"$lookup": {
"from": "Tournament",
"localField": "Tournament.tournamentID",
"foreignField": "TournamentID",
"as": "Tennis-player"
}
},
{
"$project": {
"_id": {
Name: "$Name"
},
"total_qty": {
"$reduce": {
"input": "$Tennis-player",
"initialValue": 0,
"in": {
$sum: [
"$$value",
"$$this.PrizeMoney"
]
}
}
}
}
}
])
Sample Mongo Playground (Alternative)
Our data provider supplies the data in a weird format. The arrays date and value are corresponding and guaranteed to have the same length. For whatever reason, they even decide to mix up int and string values in date.
[
{
"_id": "A000005933",
"date": [905270400000, 918748800000, 937843200000, 965923200000, 983289600000, 984931200000, 1152806400000, "1171987200000", "1225382400000", "1229616000000", "1286208000000", "1455552000000"],
"value": ["0.25", "0.15", "0", "0.25", "0.15", "0", "0.25", "0.5", "0.3", "0.1", "0.1", "-0.1"],
"version": 1614837436798
},
{
"_id": "A000005934",
"date": [915120000000, 923587200000, 941731200000, 949593600000, 953222400000, 956851200000, 962121600000, 967737600000, 970761600000, 989510400000, 999187200000, 1000742400000, 1005235200000, 1039104000000, 1046966400000, 1054828800000, 1133798400000, 1141747200000, 1150300800000, 1155052800000, 1160496000000, 1165939200000, 1173801600000, 1181664000000, 1215532800000, 1224000000000, 1226419200000, 1228838400000, 1232467200000, 1236700800000, 1239120000000, 1242144000000, 1302624000000, 1310486400000, 1320768000000, 1323792000000, 1341936000000, 1367942400000, 1384272000000, 1402416000000, 1410278400000, 1458057600000],
"value": ["3", "2.5", "3", "3.25", "3.5", "3.78", "4.25", "4.5", "4.78", "4.5", "4.25", "3.75", "3.25", "2.78", "2.5", "2", "2.25", "2.5", "2.75", "3", "3.25", "3.5", "3.75", "4", "4.25", "3.75", "3.25", "2.5", "2", "1.5", "1.25", "1", "1.25", "1.5", "1.25", "1", "0.75", "0.5", "0.25", "0.15", "0.05", "0"],
"version": 1614837436548
},
......
]
Our typical use case is to look up value based on _id and date, so I had to do something like this.
def get_value_from_mongo(id_: str, date: datetime.date) -> float:
result = db.indicators.find_one({"_id": _id}, {"value": 1, "date": 1})
date_list = list(map(str, result["date"]))
price_list = list(map(str, result["value"]))
dt = date.strftime("%s000")
price = float(price_list[date_list.index(dt)])
return price
This is hopelessly inefficient because the whole array is scanned each time I want to retrieve a single value. Maybe I could do a binary search, but date is not guaranteed to be sorted and I don't want to rely on that behavior.
Are there any MongoDB operators I can use to speed up the query?
A first possibility is to focus on the lookup: create an index on dates array
Which comes at the sake of a slower write.
In below execution plan you can see the index is used (you should benchmark if it brings that of an improvement)
> db.indicators.explain().find({dates: '1.1'})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "dummy.indicators",
"indexFilterSet" : false,
"parsedQuery" : {
"dates" : {
"$eq" : "1.1"
}
},
"queryHash" : "4204704C",
"planCacheKey" : "1DBFE945",
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",// <------
"keyPattern" : {
"dates" : 1
},
"indexName" : "dates_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"dates" : [
"dates"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"dates" : [
"[\"1.1\", \"1.1\"]"
A second possibility is to focus on retrieving the minimal data possible
With hint that bottleneck is not the date lookup but the data transfer
Thus this does not improve the lookup (given you "iterate" your array on db side instead of application code side).
You can make it with the use of
the positional operator
the projection as second argument in find with mongo >= 4.4
db.indicators.remove({})
db.indicators.insert([{_id: '0', dates: [1, '1.1', 2], prices: [1,2,3]}])
fetch = date => {
print(date)
res = db.indicators.find(
{
dates: {
$elemMatch: {
$in: [Number(date), String(date)]
}
}
},
{
'prices.$': 1 // <<--------
}
).toArray()
printjson(res)
}
fetch(2) // [ { "_id" : "0", "prices" : [ 3 ] } ]
fetch('1.1') // [ { "_id" : "0", "prices" : [ 2 ] } ]
Obviously you can compose 1 and 2, but I would have a try with just 2 to avoid creating an index
data: [
{
"_id" : ObjectId("5ebda923a52984db48ab45f6"),
"detectorid" : 1371,
"loopdata" : [
{
"starttime" : "9/15/2011 0:00:00",
"volume" : 2,
"speed" : 65,
"occupancy" : 2,
"status" : 2,
"dqflags" : 0
},
{
"starttime" : "9/15/2011 0:00:20",
"volume" : 2,
"speed" : 53,
"occupancy" : 2,
"status" : 2,
"dqflags" : 0
},
{
"starttime" : "9/15/2011 0:00:40",
"volume" : 0,
"speed" : "",
"occupancy" : 0,
"status" : 0,
"dqflags" : 0
}
]
Hey guys, this is the data that I have in my collection. I want to return back the speed is over 53. I have tried and
db.collection.find({"data.speed":{$gt:53}})
it returned the wrong results (basically returned everything) and I have no idea what I wrong. Any hints guys? Thanks
I made to you two solutions:
If you just want to keep a speeds field and the _id document, this solve the problem:
Query:
db.collection.aggregate([
// $filter will apply the condition to every element of the array
// in this case the array is [65, 53 ""]
{
$project: {
"speeds": {
$filter: {
"input": "$loopdata.speed",
"as": "speed",
"cond": {
$and: [
{
$eq: [
{
$type: "$$speed" // check if the type is numeric
},
"double"
]
},
{
$gt: [
"$$speed", // check if it's greater than 53
53
]
}
]
}
}
}
}
}
])
Result:
[
{
"_id": ObjectId("5ebda923a52984db48ab45f6"),
"speeds": [
65
]
}
]
Now if you want to keep all the fields, and filter just the array loopdata, so this solves the problem:
Query 2:
db.collection.aggregate([
{
$addFields: {
"loopdata": {
$filter: {
"input": "$loopdata",
"as": "data",
"cond": {
$and: [
{
$eq: [
{
$type: "$$data.speed"
},
"double"
]
},
{
$gt: [
"$$data.speed",
53
]
}
]
}
}
}
}
}
])
Result:
[
{
"_id": ObjectId("5ebda923a52984db48ab45f6"),
"detectorid": 1371,
"loopdata": [
{
"dqflags": 0,
"occupancy": 2,
"speed": 65,
"starttime": "9/15/2011 0:00:00",
"status": 2,
"volume": 2
}
]
}
]
i have a trouble.
i need to update value in nected array (array in array).
For example i have document like this:
{
"_id" : ObjectId("59eccf5ea7f6ff30be74d8ce"),
"name" : "some name",
"description" : "some description",
"users" : [
{
"id" : ObjectId("59d1549f4f5c6f6e0f1d6576"),
"technologies" : [
{"id": ObjectId("59450bc718fda360fdf4a719")},
]
},
{
"id": ObjectId("59d1549e4f5c6f6e0f1d6571"),
"technologies": [
{"id": ObjectId("59450f8318fda360fdf4a78b")},
{"id": ObjectId("59450bc718fda360fdf4a719")},
{"id": ObjectId("59450e3f18fda360fdf4a767")}
]
},
{
"id": ObjectId("59d154a44f5c6f6e0f1d65af"),
"technologies": [
ObjectId("59450f8318fda360fdf4a78b")
]
}
]
}
i need to delete exact technology from exact user. i know only:
_id - global document id
userId: 'users.id' element
technologyId: 'users.$.technologies.$.id' id of technology item that should be deleted
documentation of mongo says that i cant use two $ in update statement, but maybe is exists some actions to awoid this?
Try the following:
db.yourColl.update(
{
"_id": ObjectId("59eccf5ea7f6ff30be74d8ce"),
"users.id": ObjectId("59d1549e4f5c6f6e0f1d6571")
},
{
"$pull": {
"users.$.technologies": {
"id": ObjectId("59450bc718fda360fdf4a719")
}
}
}
)
The result should be:
{
"_id" : ObjectId("59eccf5ea7f6ff30be74d8ce"),
"name" : "some name",
"description" : "some description",
"users" : [
{
"id" : ObjectId("59d1549f4f5c6f6e0f1d6576"),
"technologies" : [
{
"id" : ObjectId("59450bc718fda360fdf4a719")
}
]
},
{
"id" : ObjectId("59d1549e4f5c6f6e0f1d6571"),
"technologies" : [
{
"id" : ObjectId("59450f8318fda360fdf4a78b")
},
{
"id" : ObjectId("59450e3f18fda360fdf4a767")
}
]
},
{
"id" : ObjectId("59d154a44f5c6f6e0f1d65af"),
"technologies" : [
ObjectId("59450f8318fda360fdf4a78b")
]
}
]
}
Here is an example of a nested document that I have in my collection:
{
"title" : "front-end developer",
"age" : 25,
"name" : "John",
"city" : "London",
"skills" : [
{
"name" : "js",
"project" : "1",
"scores" : [
{
max: 76,
date: date
},
{
max: 56,
date: date
}
]
},
{
"name" : "CSS",
"project" : "5",
"scores" : [
{
max: 86,
date: date
},
{
max: 36,
date: date
},
{
max: 56,
date: date
},
]
}
]
}
Is there a simple way of determining whether other documents have an identical/duplicate structure to the skills array only? e.g. has the same keys, value and array indexes? Any help would be greatly appreciated. Thanks!
Here's how you get that:
collection.aggregate({
"$group": {
"_id": "$skills",
"docs": {
"$push": "$$ROOT"
},
"count": {
$sum: 1
}
}
}, {
$match: {
"count": {
$gt: 1
}
}
})
If you are looking for developers with the same skillset, you can use the $all operator:
var john = db.developers.findOne(...);
var devs = db.developers.find({ 'skills.name': { $all: john.skills.map(x => x.name) } });