MongoDB aggregate elements of nested arrays - arrays

We've got a MongoDB (v2.4) collection that contains time-series snapshots:
{"foo": "bar",
"timeseries": [{"a": 1, "b": 2},
{"a": 2, "b": 3},
...]}
{"foo": "baz",
"timeseries": [{"a": 0, "b": 1},
{"a": 2, "b": 3},
...]}
I need to group all the entries by the foo key, and then sum the the a values of the last entry in each of the timeseries values of each document (timeseries[-1].a, as it were), per key. I want to believe there's some combination of $group, $project, and $unwind that can do what I want without having to resort to mapReduce.

Are you looking for something along the lines of:
> db.collection.aggregate([
{$unwind: "$timeseries"},
{$group: {_id: "$_id", foo: {$last: "$foo"},
last_value: {$last: "$timeseries.a"}}},
{$group: { _id: "$foo", total: { $sum: "$last_value" }}}
])
{ "_id" : "baz", "total" : 2 }
{ "_id" : "bar", "total" : 2 }
the $unwind stage will produce one document per-item in the timeserie
after that, documents are grouped back again, keeping only the $last value
finally, a second group clause will group (and sum values) by the foo field.
As a final note, I don't think this will be time efficient for very long time series as basically MongoDB will have to iterate over all items just in order to reach the last one.

Related

Mongo is it possible to increment array element if not present

Hi I am trying to increment an element inside an array as below
{ "_id" : 1
"myarray": [
{"a": 1}
{"b": 1}
{"c": 1}
]
}
{ "_id" : 2
"myarray": [
{"a": 4}
{"b": 7}
{"c": 9}
]
}
db.mycollection({"_id": 1}, {"$inc": {"myarray.$.a"}});
db.mycollection({"_id": 2}, {"$inc": {"myarray.$.b"}});
I am aware of the mongo $ operator for identifying an element in array as used above.
The problem I am tackling is in some scenarios of my data does not have the array
{ "_id" : 3
}
{ "_id" : 4
}
and if I execute db.mycollection({"_id": 3}, {"$inc": {"myarray.$.a"}}); It does not update the document with id since myarray does not exist the doc.
so the immediate fix I made was to initialize myarray to be zero whenever I insert a doc
{ "_id" : 3
"myarray": [
{"a": 0}
{"b": 0}
{"c": 0}
]
}
{ "_id" : 4
"myarray": [
{"a": 0}
{"b": 0}
{"c": 0}
]
}
Is there a way to directly update with pre-initializing the array element with zero?
db.mycollection({"_id": 1}, {"$inc": {"myarray.$.a"}}); is wrong in so many ways:
First of all it is db.mycollection.update or updateOne, updateMany etc.
Next, $inc requires a number on how much you want to increment. If it is 1, you have to specify it explicitly: {"$inc": {"myarray.$.a": 1}}
Finally, the $ operator you are aware of requires a match from the query part (the first argument of the update* functions). Please read the page you are referring to in your question.
To answer the question - no, there is no such way. It's not about the $inc - it actually initialises a field with 0 if it doesn't exists. It's $ that doesn't push a new element into array because the document won't pass the filter stage where you specify which matching subdocument should $ refer to.

Sort order in Firestore arrays

I'm trying to understand arrays in Firebase a bit more. Currently, I'm storing maps in arrays, where one of the fields inside the map is a position that I can use in my mobile app to sort the array with on retrieval and show results in the order of position.
The docs on Firebase say:
Arrays are sorted by elements. If elements are equal, the arrays are sorted by length.
For example, [1, 2, 3] < [1, 2, 3, 1] < [2].
And then there's a section describing how maps are sorted as well:
Key ordering is always sorted. For example, if you write {c: "foo", a: "bar", b: "qux"} the map is sorted by key and saved as {a: "foo", b: "bar", c: "qux"}.
Map fields are sorted by key and compared by key-value pairs, first comparing the keys and then the values. If the first key-value pairs are equal, the next key-value pairs are compared, and so on. If two maps start with the same key-value pairs, then map length is considered. For example, the following maps are in ascending order:
{a: "aaa", b: "baz"}
{a: "foo", b: "bar"}
{a: "foo", b: "bar", c: "qux"}
{a: "foo", b: "baz"}
{b: "aaa", c: "baz"}
{c: "aaa"}
But then I tried this in Firestore: I jumbled up the order of the maps in the above example, and stored them in an array:
data= [{"c": "aaa"}, {"a": "aaa", "b": "baz"}, {"a": "foo", "b": "baz"}, {"b": "aaa", "c": "baz"}, {"a": "foo", "b": "bar", "c": "qux"}, {"a": "foo", "b": "bar"}]
And upon inserting into a Firestore document, the array did not get sorted! While the keys themselves do get sorted within a single Map, the elements in the array stay in the same order.
So does sorting in arrays even work when elements are Maps? Here's an example of what I'm storing in Firestore:
{
"car_collection": {
"models": {
data: [
{
"model": "Honda",
"color": "black",
"position": 0
},
{
"model": "Hyundai",
"color": "red",
"position": 1
},
{
"model": "Chevrolet",
"color": "yellow"
"position": 2
}
]
}
}
}
I'm storing an additional field called "position", and the order of maps stays the same on every retrieval. Wondering if I even need to store this field, or data will be sorted in the order that I store it in.
Submitted a ticket to Google to improve the documentation for Array type, and I think it's helpful and accurate as seen through some smoke testing.
https://firebase.google.com/docs/firestore/manage-data/data-types
Copy-pasting the current version here:
An array cannot contain another array value as one of its elements.
Within an array, elements maintain the position assigned to them. When sorting two or more arrays, arrays are ordered based on their element values.
When comparing two arrays, the first elements of each array are compared. If the first elements are equal, then the second elements are compared and so on until a difference is found. If an array runs out of elements to compare but is equal up to that point, then the shorter array is ordered before the longer array.
For example, [1, 2, 3] < [1, 2, 3, 1] < [2]. The array [2] has the greatest first element value. The array [1, 2, 3] has elements equal to the first three elements of [1, 2, 3, 1] but is shorter in length.
So it seems you can safely expect the order of elements to be maintained in Firestore, while understanding the effects of addition/removal as well.
You will have to sort your array before posting it to Firestore.
Arrays are not sorted in RTD nor Firestore objects however are sorted by it's keys.
Or sort the arrays on the client side.

Elasticsearch sorting by array column

How to sort records by column with array of numbers?
For example:
[1, 32, 26, 16]
[1, 32, 10, 1500]
[1, 32, 1, 16]
[1, 32, 2, 17]
The result that is to be expected:
[1, 32, 1, 16]
[1, 32, 2, 17]
[1, 32, 10, 1500]
[1, 32, 26, 16]
Elasticsearch has sort mode option: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-request-sort.html#_sort_mode_option. But no one variant is not appropriated.
Language Ruby can sort arrays of numbers' array, ruby has method Array.<=>, which description says "Each object in each array is compared"
How to do the same with elasticsearch?
P.S. Sorry for my English
In ElasticSearch arrays of objects do not work as you would expect:
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested datatype
instead of the object datatype.
This is explained in more detail in Nested datatype.
It is not possible to access array elements at sort time by their indices since they are stored in a Lucene index, which allows basically only set operations ("give docs that have array element = x" or "give docs that do not have array element = x").
However, by default the initial JSON document inserted into the index is stored on the disk and is available for scripting access in the field _source.
You have two options:
use script based sorting
store value for sorting explicitly as string
Let's discuss these options in a bit more detail.
1. Script based sorting
The first option is more like a hack. Let's assume you have a mapping like this:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_array": {
"type": "integer"
}
}
}
}
}
Then you can achieve intended behavior with a scripted sort:
POST my_index/my_type/_search
{
"sort" : {
"_script" : {
"script" : "String s = ''; for(int i = 0; i < params._source.my_array.length; ++i) {s += params._source.my_array[i] + ','} s",
"type" : "string",
"order" : "asc"
}
}
}
(I tested the code on ElasticSearch 5.4, I believe there should be something equivalent for the earlier versions. Please consult relevant documentation in the case you need info for earlier versions, like for 1.4.)
The output will be:
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": null,
"_source": {
"my_array": [
1,
32,
1,
16
]
},
"sort": [
"1,32,1,16,"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": null,
"_source": {
"my_array": [
1,
32,
10,
1500
]
},
"sort": [
"1,32,10,1500,"
]
}
] }
Note that this solution will be slow and memory consuming since it will have to read _source for all documents under sort from disk and to load them into memory.
2. Denormalization
Storing the value for sorting explicitly as string is more like ElasticSearch approach, which favors denormalization. Here the idea would be to do the concatenation before inserting the document into the index and use robust sorting by a string field.
Please select the solution more appropriate for your needs.
Hope that helps!

Grouping Facets in Solr

In my Solr index I have two different types of items, A and B, in my index that have distinct fields foo and bar, respectively, but similar values that I need to group together for faceting.
A:
foo: /* could be "abc", "def" or "ghi" */
B:
bar: /* could be "abc", "ghi", or "jkl" */
It's easy enough to get the facet information for each of these fields separately:
http://myServer:<port>/<SolrPath>/q=<query>&facet.field=foo&facet.field=bar
Which gives me:
"facet_count": {
"facet_fields": {
"foo": ["abc", 10, "def", 20 "ghi", 30],
"bar": ["abc", 3, "ghi", 8, "jkl", 1]
}
}
Is there a way in Solr to specify that I want the fields A.foo and B.bar to be "lumped together" into the same facet? In other words, I need to make the facet information in the response looks like this:
"facet_count": {
"facet_fields": {
"foo": ["abc", 13, "def", 20 "ghi", 38, "jkl", 1]
}
}
No, my advice would be to index the values into a single field. Using copy field directives this would look like this (in schema.xml):
<copyField source="foo" dest="foobar" />
<copyField source="bar" dest="foobar" />
The would preserve the original foo and bar fields. To get your combined facets, simply facet on the new field:
?q=*:*
&facet=true
&facet.field=foobar
Edit: it might be possible with facet queries, but only if the list of unique values is small and limited, and you're willing to write a separate facet query for each value. Even then, the results will look different (count per query instead an array of field value, count).

MongoDB - search documents through an integer array

I need to retrieve documents that contain at least one value inside an array. The structure of my document is:
{ "_id": 3,
"username": "111111",
"name": "XPTO 1",
"codes": [ 2, 4, 5 ],
"available": true }
{ "_id": 4,
"username": "22222",
"name": "XPTO 2",
"codes": [ 3, 5 ],
"available": true }
I need to do a find by "codes" and if i search for value "5", i need to retrieve all documents that contains this value inside their array.
I've tried to use #elemMatch but no success...
db.user.find({codes: {"$elemMatch": {codes: [2,8]}}}, {"codes":1})
How can i do this?
Thanks in advance.
You can check for values inside an array just like you compare the values for some field.
So, you would need to do it like this, without using $elemMatch: -
If you want to check whether an array contain a single value 5: -
db.user.find({codes: 5}, {codes:1})
This will return all the document, where codes array contain 5.
If you want to check whether an array contain a value out of given set of values: -
db.user.find({codes: {$in: [2, 8]}}, {codes:1})
This will return documents with array containing either 2 or 8
If you want to check whether an array contain all the values in a list: -
db.user.find({codes: {$all: [2, 5]}}, {codes:1})
This will return all document with array containing both 2 and 5.

Resources