Finding union or intersection of buckets using elasticsearch aggregations

Finding union or intersection of buckets using elasticsearch aggregations - database

i have nested aggregations and i want to find union or intersections of 2nd aggregations buckets based on conditions on my 1st aggregation bucket results.For eg this my aggregation.
"aggs": {
"events": {
"terms": {
"field": "event_name"
},
"aggs":{
"devices":{
"terms":{
"field": "device-id"
}
}
}
}
}
And this the result of my aggregation
"aggregations": {
"events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "conversion_checkout",
"doc_count": 214,
"devices": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 6,
"buckets": [
{
"key": "9a11f243d44",
"doc_count": 94
},
{
"key": "ddcb21fd6cb",
"doc_count": 35
}
]
}
},
{
"key": "action_view_product",
"doc_count": 5,
"devices": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "54E4C593",
"doc_count": 4
},
{
"key": "9a11f243d44",
"doc_count": 1
}
]
}
}
]
}
}
Now if i want to find all the devices which have done action_view_product and conversion_checkout how do i do it in aggregations?

I think you want to get all the device-ids having event_names action_view_product and conversion_checkout as follows-
{
"aggregations":{
"devices_agg":{
"doc_count":516,
"devices":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":623232334,
"doc_count":275
},
{
"key":245454512,
"doc_count":169
},
{
"key":345454567,
"doc_count":32
},
{
"key":578787565,
"doc_count":17
},
{
"key":146272715,
"doc_count":23
}
]
}
}
}
}
The doc_count = 516 is the total number of documents having event_names either action_view_product or conversion_checkout and "key" in the devices aggregation is device-id.
If I get you correct, then below query will do the thing for you-
{
"size": 0,
"aggs": {
"devices_agg": {
"filter": {
"bool": {
"must": [
{
"terms": {
"event_name": [
"action_view_product",
"conversion_checkout"
]
}
}
]
}
},
"aggs": {
"devices": {
"terms": {
"field": "device-id",
"size": 100
}
}
}
}
}
}
Let me know if I got you wrong.

Related

What kind of JSON JOLT Spec to get key-value output where key is a data value and value is an array

I'm trying to find a spec array that yields the desired output
Input:
{
"aggregations": {
"masterId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1Q52",
"doc_count": 3,
"serialNumbers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "3R24Z3",
"count": 1
},
{
"key": "526GA2",
"count": 1
},
{
"key": "873XHE",
"count": 1
}
]
}
}
]
}
}
}
Spec:
Trying to figure this out
Desired Output:
{
"1Q52": ["3R24Z3", "526GA2", "873XHE"]
}
My current Spec array is
[
{
"operation": "shift",
"spec": {
"aggregations": {
"masterId": {
"buckets": {
"*": {
"key": "key",
"serialNumbers": {
"buckets": {
"*": {
"key": "key"
}
}
}
}
}
}
}
}
}
]
and my current output is
{
"key" : [ "1Q52", "3R24Z3", "526GA2", "873XHE" ]
}
What kind of spec array could give me the desired output?

You can go 4 level up and grab the value to be the key for the array by using "key": "#(4,key)" such as
[
{
"operation": "shift",
"spec": {
"aggregations": {
"masterId": {
"buckets": {
"*": {
"serialNumbers": {
"buckets": {
"*": {
"key": "#(4,key)"
}
}
}
}
}
}
}
}
}
]
where : and then triple { character traverse is counted in order to reach the desired key value.

Multiply Varying number of Values Inside array: Mongodb

I have arranged my data so that the documents belonging to the same customer id are aggregated into a single collection. The data format is as follows.
{
"items": [
{
"stock_code": [
"22617",
"22768",
"20749"
],
"description": [
"DESIGN",
"FAMILY PHOTO FRAME",
"ASSORTED CASES"
],
"quantity": [
18,
12,
84
],
"unit_price": [
4.95,
9.95,
6.35
]
}
],
"_id": 581485,
"customer_id": 17389,
"country": "United Kingdom"
}
I need to multiply the values of array quantity with corresponding unit_price and get a total for multiple documents in a new field. I have tried using the $reduce function and $map function to get the output but both of them result in "error"
Multiply only supports numeric types, and not arrays
Could you please suggest how should i go about accomplishing this.
Codes tried:
"$addFields": {"order_total" :
{
"$sum": {
"$map": {
"input": "$items",
"as": "items",
"in": { "$multiply": [
{ "$ifNull": [ "$$items.quantity", 0 ] },
{ "$ifNull": [ "$$items.unit_price", 0 ] }
]}
}
}
}
}
Second:
"order_total" : {
"$reduce" : {
"input" : "$items",
"initialValue" : Decimal128("0.00"),
"in": {
"$sum" : [
"$$value",
{"$multiply" : [ "$$this.quantity", "$$this.unit_price" ] }
]}
}
}
The expected result needs to add a new field of "total" by multiplying the corresponding entries of unit_price with quantity. The error message is that of multiply only supports numeric types and not arrays.

I would decompose the problem into solvable chunks beginning with the smallest unit and start with the two arrays quantity and unit_price.
So given just a document with the structure
{
"quantity": [ 18, 12, 84 ],
"unit_price": [ 4.95, 9.95, 6.35 ]
}
We can add another field with the totals for each element in both arrays i.e.
{
"quantity": [ 18, 12, 84 ],
"unit_price": [ 4.95, 9.95, 6.35 ],
"total": [ 89.1, 119.4, 533.4 ]
}
This field can be computed using $range within $map as
{
"total": {
"$map": {
"input": { "$range": [ 0, { "$size": "$quantity" } ] },
"as": "idx",
"in": {
"$let": {
"vars": {
"qty": { "$arrayElemAt": [ "$quantity", "$$idx" ] },
"price": {
"$ifNull": [
{ "$arrayElemAt": [ "$unit_price", "$$idx" ] },
0
]
}
},
"in": { "$multiply": [ "$$qty", "$$price" ] }
}
}
}
}
}
This then becomes basis for calculating the order_total field with two pipeline stages for clarity (although can be composed into a single pipeline with $reduce for brevity)
var totalMapExpression = {
"$map": {
"input": { "$range": [ 0, { "$size": "$$item.quantity" } ] },
"as": "idx",
"in": {
"$let": {
"vars": {
"qty": { "$arrayElemAt": [ "$$item.quantity", "$$idx" ] },
"price": {
"$ifNull": [
{ "$arrayElemAt": [ "$$item.unit_price", "$$idx" ] },
0
]
}
},
"in": { "$multiply": [ "$$qty", "$$price" ] }
}
}
}
};
db.collection.aggregate([
{ "$addFields": {
"items": {
"$map": {
"input": "$items",
"as": "item",
"in": {
"quantity": "$$item.quantity",
"unit_price": "$$item.unit_price",
"stock_code": "$$item.stock_code",
"description": "$$item.description",
"total": { "$sum": totalMapExpression }
}
}
}
} },
{ "$addFields": {
"order_total": { "$sum": "$items.total" }
} }
])

Try the below query:
db.collection.aggregate(
[{ $unwind: { path: "$items",} },
{ $unwind: { path: "$items.quantity",} },
{ $unwind: { path: "$items.unit_price",} },
{ $addFields: { 'total': {$multiply: ["$items.quantity", "$items.unit_price"] }} }])

Elasticsearch aggregation only on specific entries in an array

I'm new to Elasticsearch and can't figure out how to solve the following problem.
The easiest way to explain my problem is to show you an example.
The following array "listing" is part of all my files in Elasticsearch, but the entries vary, so the "person" with the "id" 42, might be in 50% of my files. What I'm trying to do is to get the average "ranking.position.standard" of all the persons with id 42 in all my files in Elasticsearch.
{
"listing": [
{
"person": {
"id": 42
},
"ranking": {
"position": {
"standard": 2
}
}
},
{
"person": {
"id": 55
},
"ranking": {
"position": {
"standard": 7
}
}
}
]
}
Thanks for your help!

First of all, do you store listings as an object or nested data type? I don't think it's going to work if it's an object, so try the following example:
PUT /test
{
"mappings": {
"_default_": {
"properties": {
"listing": {
"type": "nested"
}
}
}
}
}
PUT /test/aa/1
{
"listing": [
{
"person": {
"id": 42
},
"ranking": {
"position": {
"standard": 2
}
}
},
{
"person": {
"id": 55
},
"ranking": {
"position": {
"standard": 7
}
}
}
]
}
PUT /test/aa/2
{
"listing": [
{
"person": {
"id": 42
},
"ranking": {
"position": {
"standard": 5
}
}
},
{
"person": {
"id": 55
},
"ranking": {
"position": {
"standard": 6
}
}
}
]
}
GET test/_search
{
"size": 0,
"aggs": {
"nest": {
"nested": {
"path": "listing"
},
"aggs": {
"persons": {
"terms": {
"field": "listing.person.id",
"size": 10
},
"aggs": {
"avg_standard": {
"avg": {
"field": "listing.ranking.position.standard"
}
}
}
}
}
}
}
}
This has brought me the following result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"nest": {
"doc_count": 4,
"persons": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 42,
"doc_count": 2,
"avg_standard": {
"value": 3.5
}
},
{
"key": 55,
"doc_count": 2,
"avg_standard": {
"value": 6.5
}
}
]
}
}
}
}
It does seem correct.

Custom math with aggregation using mongo and angular

I'm currently using aggregation to display registration teams and nets for a stats page. I can do the count on each level, but the calculations for nets is inaccurate. My aggregation is as follows:
module.exports.registrationStats = function(req, res) {
Registration.aggregate([
{
"$group": {
"_id": {
"day": "$day",
"group": "$group",
"division": "$division",
"level": "$level"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": {
"day": "$_id.day",
"group": "$_id.group",
"division": "$_id.division"
},
"count": { "$sum": "$count" },
"levels": {
"$push": {
"level": "$_id.level",
"teams": "$count",
"nets" : {$ceil : { $divide: [ "$count" , 5 ] } }
}
}
}
},
{
"$group": {
"_id": {
"day": "$_id.day",
"group": "$_id.group"
},
"count": { "$sum": "$count" },
"divisions": {
"$push": {
"division": "$_id.division",
"count": "$count",
"levels": "$levels"
}
}
}
}
]).exec(function(err, regStats){
if(err) {
console.log("Error grouping registrations");
res.status(500).json(err);
} else {
console.log("Found and grouped " + regStats.length + " regStats");
res.json(regStats);
}
});
};
This gives me the following as an output:
[
{
"_id": {
"day": "Saturday",
"group": "nonpro"
},
"count": 144,
"divisions": [
{
"division": "Men's",
"count": 69,
"levels": [
{
"level": "BB",
"teams": 30,
"nets": 6
},
{
"level": "A",
"teams": 8,
"nets": 2
},
{
"level": "B",
"teams": 19,
"nets": 4
},
{
"level": "AA",
"teams": 12,
"nets": 3
}
]
},
{
"division": "Women's",
"count": 75,
"levels": [
{
"level": "AA",
"teams": 9,
"nets": 2
},
{
"level": "BB",
"teams": 16,
"nets": 4
},
{
"level": "B",
"teams": 18,
"nets": 4
},
{
"level": "A",
"teams": 32,
"nets": 7
}
]
}
]
}
]
The problem is that I cannot just run a ceil filter on the Math.ceil(divisions.count/5) to get a value for divisions.nets or Math.ceil(_id.count/5) to get _id.nets because they are wrong in some cases.
I need to be able to total divisions.levels.nets and push that into divisions.nets and add divisions.nets and put that value into _id.nets so the calculations work properly.
Any ideas on how to do this?

Something like this. Add $project stage to calculate the nets for level followed by $sum & $push nets for rest of $groups
Registration.aggregate([
{
"$group": {
"_id": {
"day": "$day",
"group": "$group",
"division": "$division",
"level": "$level"
},
"count": { "$sum": 1 }
}
},
{ $project: { count:1, nets: { $ceil : { $divide: [ "$count" , 5 ] } } } },
{
"$group": {
"_id": {
"day": "$_id.day",
"group": "$_id.group",
"division": "$_id.division"
},
"count": { "$sum": "$count" },
"nets": { "$sum": "$nets" },
"levels": {
"$push": {
"level": "$_id.level",
"teams": "$count",
"nets" : "$nets"
}
}
}
},
{
"$group": {
"_id": {
"day": "$_id.day",
"group": "$_id.group"
},
"count": { "$sum": "$count" },
"nets": { "$sum": "$nets" },
"divisions": {
"$push": {
"division": "$_id.division",
"count": "$count",
"nets:": "$nets",
"levels": "$levels"
}
}
}
}
])

Elasticsearch terms aggregation by strings in an array

How can I write an Elasticsearch terms aggregation that splits the buckets by the entire term rather than individual tokens? For example, I would like to aggregate by state, but the following returns new, york, jersey and california as individual buckets, not New York and New Jersey and California as the buckets as expected:
curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
"aggs" : {
"states" : {
"terms" : {
"field" : "states",
"size": 10
}
}
}
}'
My use case is like the one described here
https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html
with just one difference:
the city field is an array in my case.
Example object:
{
"states": ["New York", "New Jersey", "California"]
}
It seems that the proposed solution (mapping the field as not_analyzed) does not work for arrays.
My mapping:
{
"properties": {
"states": {
"type":"object",
"fields": {
"raw": {
"type":"object",
"index":"not_analyzed"
}
}
}
}
}
I have tried to replace "object" by "string" but this is not working either.

I think all you're missing is "states.raw" in your aggregation (note that, since no analyzer is specified, the "states" field is analyzed with the standard analyzer; the sub-field "raw" is "not_analyzed"). Though your mapping might bear looking at as well. When I tried your mapping against ES 2.0 I got some errors, but this worked:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"states": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Then I added a couple of docs:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"states":["New York","New Jersey","California"]}
{"index":{"_id":2}}
{"states":["New York","North Carolina","North Dakota"]}
And this query seems to do what you want:
POST /test_index/_search
{
"size": 0,
"aggs" : {
"states" : {
"terms" : {
"field" : "states.raw",
"size": 10
}
}
}
}
returning:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"states": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 2
},
{
"key": "California",
"doc_count": 1
},
{
"key": "New Jersey",
"doc_count": 1
},
{
"key": "North Carolina",
"doc_count": 1
},
{
"key": "North Dakota",
"doc_count": 1
}
]
}
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e