Elasticsearch / Opensearch - intersection of common values in arrays - arrays

given
a = [1,2,3,4]
and
b = [1,3]
c = [5,6]
d = [1,6]
how can i find the exact number of common values of a with b,c,d in elasticsearch?
I would expect:
b -> 2
c -> 0
d -> 1

Let consider below is your elasticsearch document,
{
"b": [1,3],
"c": [5,6],
"d": [1,6]
}
Your Elasticsearch Query will looks like below:
Here, You need to use first terms aggregation and above that you need to apply sum_bucket aggregation.
{
"size": 0,
"aggs": {
"b_count": {
"terms": {
"field": "b",
"size": 10,
"include": [1,2,3,4]
}
},
"c_count": {
"terms": {
"field": "c",
"size": 10,
"include": [1,2,3,4]
}
},
"d_count": {
"terms": {
"field": "d",
"size": 10,
"include": [1,2,3,4]
}
},
"b_sum": {
"sum_bucket": {
"buckets_path": "b_count>_count"
}
},
"c_sum": {
"sum_bucket": {
"buckets_path": "c_count>_count"
}
},
"d_sum": {
"sum_bucket": {
"buckets_path": "d_count>_count"
}
}
}
}
Sample Response:
You can use value of b_sum, c_sum and d_sum from below response.
"aggregations" : {
"b_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1
},
{
"key" : 3,
"doc_count" : 1
}
]
},
"d_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1
}
]
},
"c_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
},
"b_sum" : {
"value" : 2.0
},
"c_sum" : {
"value" : 0.0
},
"d_sum" : {
"value" : 1.0
}
}

Related

get the sum after the $unwind and $lookup returns 0

Player collection:
{ "_id" : 1, "Name" : "John Aims", "Gender" : "M", "DoB" : ISODate("1990-01-01T00:00:00Z"), "Nationality" : "USA", "Hand" : "R", "YearTurnedPro" : 2010, "Tournament" : [ { "tournamentID" : 1, "TournamentYear" : 2016 }, { "tournamentID" : 2, "TournamentYear" : 2019 }, { "tournamentID" : 3, "TournamentYear" : 2021 } ] }
{ "_id" : 2, "Name" : "George Brown", "Gender" : "M", "DoB" : ISODate("1997-03-04T00:00:00Z"), "Nationality" : "GB", "Hand" : "L", "YearTurnedPro" : 2013, "Tournament" : [ { "tournamentID" : 2, "TournamentYear" : 2016 }, { "tournamentID" : 5, "TournamentYear" : 2019 } ] }
Tournament collection:
{ "_id" : ObjectId("626c18a3d880647a888888ff"), "TournamentID" : 1, "TournamentCode" : "GS1", "Position" : 8, "PrizeMoney" : 125000, "RankingPoints" : 250 }
{ "_id" : ObjectId("626c18c2d880647a888888ff"), "TournamentID" : 2, "TournamentCode" : "GS1", "Position" : 4, "PrizeMoney" : 250000, "RankingPoints" : 500 }
{ "_id" : ObjectId("626c18ddd880647a888888ff"), "TournamentID" : 3, "TournamentCode" : "GS1", "Position" : 1, "PrizeMoney" : 1000000, "RankingPoints" : 2000 }
1st Question:
Hello, I want to get the sum of ranking points of each player.
I have tried:
db.Player.aggregate([
{"$unwind" : "$Tournament"},
{"$lookup":
{"from":"Tournament",
"localField":"Tournament.tournamentID",
"foreignField":"TournamentID",
"as":"Tennis-player"}},
{ "$group": {
"_id": { Name:"$Name" },
"total_qty": { "$sum": "$Tennis-player.PrizeMoney" }
}}
])
But I get for every played the sum is 0.
I can show it on playground as it is using more than 1 collection.
2nd question:
Would it be better to create only 1 collections with all the data?
$unwind
$lookup
$set - As from stage 2 Tennis-player returns an array with guarantee only 1 document in array. Use $first to get the first document in Tennis-player array field to become a document field.
$group
db.Player.aggregate([
{
"$unwind": "$Tournament"
},
{
"$lookup": {
"from": "Tournament",
"localField": "Tournament.tournamentID",
"foreignField": "TournamentID",
"as": "Tennis-player"
}
},
{
$set: {
"Tennis-player": {
"$first": "$Tennis-player"
}
}
},
{
"$group": {
"_id": {
Name: "$Name"
},
"total_qty": {
"$sum": "$Tennis-player.PrizeMoney"
}
}
}
])
Sample Mongo Playground
Alternative:
$lookup - Work $lookup with an Array
$project - Decorate output documents. Create total_qty field and use $reduce to perform sum operation of Tennic-player.PrizeMoney.
db.Player.aggregate([
{
"$lookup": {
"from": "Tournament",
"localField": "Tournament.tournamentID",
"foreignField": "TournamentID",
"as": "Tennis-player"
}
},
{
"$project": {
"_id": {
Name: "$Name"
},
"total_qty": {
"$reduce": {
"input": "$Tennis-player",
"initialValue": 0,
"in": {
$sum: [
"$$value",
"$$this.PrizeMoney"
]
}
}
}
}
}
])
Sample Mongo Playground (Alternative)

How to query by a field in an array of sub-documents with greater than condition?

data: [
{
"_id" : ObjectId("5ebda923a52984db48ab45f6"),
"detectorid" : 1371,
"loopdata" : [
{
"starttime" : "9/15/2011 0:00:00",
"volume" : 2,
"speed" : 65,
"occupancy" : 2,
"status" : 2,
"dqflags" : 0
},
{
"starttime" : "9/15/2011 0:00:20",
"volume" : 2,
"speed" : 53,
"occupancy" : 2,
"status" : 2,
"dqflags" : 0
},
{
"starttime" : "9/15/2011 0:00:40",
"volume" : 0,
"speed" : "",
"occupancy" : 0,
"status" : 0,
"dqflags" : 0
}
]
Hey guys, this is the data that I have in my collection. I want to return back the speed is over 53. I have tried and
db.collection.find({"data.speed":{$gt:53}})
it returned the wrong results (basically returned everything) and I have no idea what I wrong. Any hints guys? Thanks
I made to you two solutions:
If you just want to keep a speeds field and the _id document, this solve the problem:
Query:
db.collection.aggregate([
// $filter will apply the condition to every element of the array
// in this case the array is [65, 53 ""]
{
$project: {
"speeds": {
$filter: {
"input": "$loopdata.speed",
"as": "speed",
"cond": {
$and: [
{
$eq: [
{
$type: "$$speed" // check if the type is numeric
},
"double"
]
},
{
$gt: [
"$$speed", // check if it's greater than 53
53
]
}
]
}
}
}
}
}
])
Result:
[
{
"_id": ObjectId("5ebda923a52984db48ab45f6"),
"speeds": [
65
]
}
]
Now if you want to keep all the fields, and filter just the array loopdata, so this solves the problem:
Query 2:
db.collection.aggregate([
{
$addFields: {
"loopdata": {
$filter: {
"input": "$loopdata",
"as": "data",
"cond": {
$and: [
{
$eq: [
{
$type: "$$data.speed"
},
"double"
]
},
{
$gt: [
"$$data.speed",
53
]
}
]
}
}
}
}
}
])
Result:
[
{
"_id": ObjectId("5ebda923a52984db48ab45f6"),
"detectorid": 1371,
"loopdata": [
{
"dqflags": 0,
"occupancy": 2,
"speed": 65,
"starttime": "9/15/2011 0:00:00",
"status": 2,
"volume": 2
}
]
}
]

Find specific field in an array of embedded documents

I'm trying to find specific element of array in my MongoDB document, but with all my tries query returns all array. How can I get the right object?
My collection's document looks like:
{
"Achievements": [
{
"AchievementID": 1,
"AchievementEventID": 0
},
{
"AchievementID": 2,
"AchievementEventID": 1
}
],
"Buildings": [
{
"BuildingID": 1,
"BuildingType": "type1"
},
{
"BuildingID": 2,
"BuildingType": "type1"
},
]
}
I tried to get only one element of my Achievements array:
db.data.find({'Achievements.AchievementEventID': 0})
I expected to get only element with AchievementEventID equal to 0:
{
"AchievementID": 1,
"AchievementEventID": 0
}
But I got whole Achievements array.
How I could get only specific element?
You can use the Aggregation query like in the following example. The sample achive collection has 3 documents:
{
"_id" : 1,
"Achievements" : [
{
"AchievementID" : 1,
"AchievementEventID" : 0
},
{
"AchievementID" : 2,
"AchievementEventID" : 1
}
],
"Buildings" : [
{
"BuildingID" : 1,
"BuildingType" : "type1"
},
{
"BuildingID" : 2,
"BuildingType" : "type1"
}
]
}
{
"_id" : 2,
"Achievements" : [
{
"AchievementID" : 2,
"AchievementEventID" : 2
}
],
"Buildings" : [
{
"BuildingID" : 2,
"BuildingType" : "type2"
}
]
}
{
"_id" : 3,
"Achievements" : [
{
"AchievementID" : 31,
"AchievementEventID" : 1
}
],
"Buildings" : [
{
"BuildingID" : 3,
"BuildingType" : "type3"
}
]
}
The Query:
db.achive.aggregate( [
{ $project: { Buildings: 0} },
{ $unwind: "$Achievements" },
{ $match: { "Achievements.AchievementEventID": { $eq: 1 } } }
])
=>
{ "_id" : 1, "Achievements" : { "AchievementID" : 2, "AchievementEventID" : 1 } }
{ "_id" : 3, "Achievements" : { "AchievementID" : 31, "AchievementEventID" : 1 } }
I added the _id field to identify the documents selected.
Updated Query:
db.achive.aggregate( [
{ $unwind: "$Achievements" },
{ $match: { "Achievements.AchievementEventID": { $eq: 1 } } },
{ $project: { AchievementID: "$Achievements.AchievementID", _id: 0} },
])
=>
{ "AchievementID" : 2 }
{ "AchievementID" : 31 }
You can use $elemMatch for that
db.data.find(
{ 'Achievements.AchievementEventID': 0},{Buildings:0, Achievements: {$elemMatch: {AchievementEventID: 0}}}
)
If the above solution doesn't for you then try the below one:
db.data.find({"Achievements.AchievementEventID": 0}, {Buildings: 0, 'Achievements.$': 1});
You can use aggregate instead of a find like this
db.data.aggregate(
[{'$unwind':'$Achievements'},{$match:{
'Achievements.AchievementEventID':0
}},{$project:{
'AchievementID':'$Achievements.AchievementID',
'AchievementEventID':'$Achievements.AchievementEventID'
}}]
);

Upserting a value at an array position using $min

I would like to either insert a new document with a default value as part of an array, or update that part of the array if the document already exists.
What I thought of was:
db.test.update(
{ "a": 5 },
{ $setOnInsert: { "b": [0, 0] }, $min: { "b.0": 5 } },
{ upsert: true }
);
If I do that, then I get:
Cannot update 'b' and 'b.0' at the same time
Another idea was to remove $setOnInsert and just keep $min, since the minimum between nothing and 5 should be 5.
db.test.update(
{ "a": 5 },
{ $min: { "b.0": 5 } },
{ upsert: true }
);
This doesn't raise an error, but now the document I get is:
{ "a" : 5, "b" : { "0" : 5 } }
I need an array with 5 at position 0 however, not an object with a 0 property.
How can I achieve this?
You can use .bulkWrite() for this, and it's actually a prime use case of why this exists. It only sends "one" actual request to the server and has only one response. It's still two operations, but they are more or less tied together and generally atomic anyway:
db.junk.bulkWrite([
{ "updateOne": {
"filter": { "a": 1 },
"update": { "$setOnInsert": { "b": [ 5,0 ] } },
"upsert": true
}},
{ "updateOne": {
"filter": { "a": 1 },
"update": { "$min": { "b.0": 5 } }
}}
])
Run for the first time will give you an "upsert", note that it's "inserted" and not "modified" in the response:
{
"acknowledged" : true,
"deletedCount" : 0,
"insertedCount" : 0,
"matchedCount" : 1,
"upsertedCount" : 1,
"insertedIds" : {
},
"upsertedIds" : {
"0" : ObjectId("5947c412d6eb0b7d6ac37f09")
}
}
And the document of course looks like:
{
"_id" : ObjectId("5947c412d6eb0b7d6ac37f09"),
"a" : 1,
"b" : [
5,
0
]
}
Then run with a different value to $min as you likely would in real cases:
db.junk.bulkWrite([
{ "updateOne": {
"filter": { "a": 1 },
"update": { "$setOnInsert": { "b": [ 5,0 ] } },
"upsert": true
}},
{ "updateOne": {
"filter": { "a": 1 },
"update": { "$min": { "b.0": 3 } }
}}
])
And the response:
{
"acknowledged" : true,
"deletedCount" : 0,
"insertedCount" : 0,
"matchedCount" : 2,
"upsertedCount" : 0,
"insertedIds" : {
},
"upsertedIds" : {
}
}
Which "matched" 2 but of course $setOnInsert does not apply, so the result is:
{
"_id" : ObjectId("5947c412d6eb0b7d6ac37f09"),
"a" : 1,
"b" : [
3,
0
]
}
Just like it should be

Golang get array index value in json response

So I have some query to database (mongodb) which will order results by value field.
all := EValues{}
err := con.Find(bson.M{"name": "somename}).Sort("-value").All(&all)
Json output for this looks like:
"values": [
{
"user_name": "guest7485",
"value": 8911,
"value_date": "2016-03-09T14:40:34.512Z"
},
{
"user_name": "guest7485",
"value": 539,
"value_date": "2016-03-07T14:11:05.217Z"
},
{
"user_name": "guest7485",
"value": 221,
"value_date": "2016-03-07T14:11:08.853Z"
},
{
"user_name": "guest7485",
"value": 77,
"value_date": "2016-03-07T14:11:12.377Z"
}
]
In my json response I need to add parameter "position" which should be basically equal to 1 - first result, 2 - second result and so on, for all results. So my final output should be:
"values": [
{
"position": 1,
"user_name": "guest7485",
"value": 8911,
"value_date": "2016-03-09T14:40:34.512Z"
},
{
"position": 2,
"user_name": "guest7485",
"value": 539,
"value_date": "2016-03-07T14:11:05.217Z"
},
{
"position": 3,
"user_name": "guest7485",
"value": 221,
"value_date": "2016-03-07T14:11:08.853Z"
},
{
"position": 4,
"user_name": "guest7485",
"value": 77,
"value_date": "2016-03-07T14:11:12.377Z"
}
]
I'm wondering how to solve this with mgo and go in general, and I would be really greatfull if someone can give me the most efficient way to solve this.
Update:
Definition of Evalues is bellow:
type EValue struct {
ID bson.ObjectId `json:"-" bson:"_id,omitempty"`
Name string `json:"-" bson:"name"`
UserId bson.ObjectId `json:"-" bson:"userId"`
UserName string `json:"user_name" bson:"userName"`
Value int64 `json:"value" bson:"value"`
AddedTime time.Time `json:"value_date" bson:"addedTime"`
}
type EValues []EValue
Add a position field to EValue:
type EValue struct {
... other fields here
Position int `json:"position" bson:"-"`
}
Loop through db results and set the field:
for i := range all {
all[i].Position = i + 1
}
Marshal the result as JSON.
With MongDB 3.2 this can be done using the $unwind operator where you can pass an object with the field path and the field includeArrayIndex which will hold the array index:
pipeline = [
{ "$match": {"name": "somename"} },
{ "$unwind": { "path": "$values", "includeArrayIndex": "position" } },
{
"$project": {
"name": 1,
"newarray.position": "$position",
"newarray.user_name": "$values.user_name",
"newarray.value_date": "$values.value_date",
"newarray.value": "$values.value",
}
},
{
"$group": {
"_id": "$name",
"values": { "$push": "$newarray" }
}
}
]
db.test.aggregate(pipeline);
Output
> db.test.aggregate(pipeline).pretty();
{
"_id" : "somename",
"values" : [
{
"position" : NumberLong(0),
"user_name" : "guest8911",
"value_date" : "2016-03-09T14:40:34.512Z",
"value" : 8911
},
{
"position" : NumberLong(1),
"user_name" : "guest7485",
"value_date" : "2016-03-07T14:11:05.217Z",
"value" : 539
},
{
"position" : NumberLong(2),
"user_name" : "guest7485",
"value_date" : "2016-03-07T14:11:08.853Z",
"value" : 221
},
{
"position" : NumberLong(3),
"user_name" : "guest7485",
"value_date" : "2016-03-07T14:11:12.377Z",
"value" : 77
}
]
}
>
If this is not supported with the mgo driver, then a not so efficient approach would be to use Map-Reduce for this. The following mongo shell example demonstrates how you can run the operation:
Populate test collection:
db.test.insert({
"name": "somename",
"values": [
{
"user_name": "guest8911",
"value": 8911,
"value_date": "2016-03-09T14:40:34.512Z"
},
{
"user_name": "guest7485",
"value": 539,
"value_date": "2016-03-07T14:11:05.217Z"
},
{
"user_name": "guest7485",
"value": 221,
"value_date": "2016-03-07T14:11:08.853Z"
},
{
"user_name": "guest7485",
"value": 77,
"value_date": "2016-03-07T14:11:12.377Z"
}
]
})
Run the following map-reduce operation:
> mr = db.runCommand({
"mapreduce": "test",
"map": function() {
var arr = []
for(var i=0; i < this.values.length; i++){
var val = this.values[i];
val["position"] = i+1;
arr.push(val);
}
emit(this._id, arr);
},
"reduce" : function() {},
"out": "test_keys"
})
Query resulting collection:
> db[mr.result].find().pretty()
{
"_id" : ObjectId("56e18ab84b9018ec86d2a6bd"),
"value" : [
{
"user_name" : "guest8911",
"value" : 8911,
"value_date" : "2016-03-09T14:40:34.512Z",
"position" : 1
},
{
"user_name" : "guest7485",
"value" : 539,
"value_date" : "2016-03-07T14:11:05.217Z",
"position" : 2
},
{
"user_name" : "guest7485",
"value" : 221,
"value_date" : "2016-03-07T14:11:08.853Z",
"position" : 3
},
{
"user_name" : "guest7485",
"value" : 77,
"value_date" : "2016-03-07T14:11:12.377Z",
"position" : 4
}
]
}
>
Now given the listing above, you can assemble your query in mgo using MapReduce
job := mgo.MapReduce{
Map: "function(){var arr=[];for(var i=0;i<this.values.length; i++){var val=this.values[i];val['position']=i+1;arr.push(val);};emit(this._id,arr);}",
Reduce: "function() { }",
}
var result []struct { Id int "_id"; Value []EValue }
_, err := collection.Find(nil).MapReduce(job, &result)
if err != nil {
panic(err)
}
for _, item := range result {
fmt.Println(item.Value)
}
For more details, check the documentation: https://godoc.org/labix.org/v1/mgo#MapReduce:

Resources