Elasticsearch sorting by array column - arrays

How to sort records by column with array of numbers?
For example:
[1, 32, 26, 16]
[1, 32, 10, 1500]
[1, 32, 1, 16]
[1, 32, 2, 17]
The result that is to be expected:
[1, 32, 1, 16]
[1, 32, 2, 17]
[1, 32, 10, 1500]
[1, 32, 26, 16]
Elasticsearch has sort mode option: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-request-sort.html#_sort_mode_option. But no one variant is not appropriated.
Language Ruby can sort arrays of numbers' array, ruby has method Array.<=>, which description says "Each object in each array is compared"
How to do the same with elasticsearch?
P.S. Sorry for my English

In ElasticSearch arrays of objects do not work as you would expect:
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested datatype
instead of the object datatype.
This is explained in more detail in Nested datatype.
It is not possible to access array elements at sort time by their indices since they are stored in a Lucene index, which allows basically only set operations ("give docs that have array element = x" or "give docs that do not have array element = x").
However, by default the initial JSON document inserted into the index is stored on the disk and is available for scripting access in the field _source.
You have two options:
use script based sorting
store value for sorting explicitly as string
Let's discuss these options in a bit more detail.
1. Script based sorting
The first option is more like a hack. Let's assume you have a mapping like this:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_array": {
"type": "integer"
}
}
}
}
}
Then you can achieve intended behavior with a scripted sort:
POST my_index/my_type/_search
{
"sort" : {
"_script" : {
"script" : "String s = ''; for(int i = 0; i < params._source.my_array.length; ++i) {s += params._source.my_array[i] + ','} s",
"type" : "string",
"order" : "asc"
}
}
}
(I tested the code on ElasticSearch 5.4, I believe there should be something equivalent for the earlier versions. Please consult relevant documentation in the case you need info for earlier versions, like for 1.4.)
The output will be:
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": null,
"_source": {
"my_array": [
1,
32,
1,
16
]
},
"sort": [
"1,32,1,16,"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": null,
"_source": {
"my_array": [
1,
32,
10,
1500
]
},
"sort": [
"1,32,10,1500,"
]
}
] }
Note that this solution will be slow and memory consuming since it will have to read _source for all documents under sort from disk and to load them into memory.
2. Denormalization
Storing the value for sorting explicitly as string is more like ElasticSearch approach, which favors denormalization. Here the idea would be to do the concatenation before inserting the document into the index and use robust sorting by a string field.
Please select the solution more appropriate for your needs.
Hope that helps!

Related

ElasticSearch 5.1 Filtering by Comparing Arrays

I have a keyword array field (say f) and I want to filter documents with an exact array (e.g. filter docs with f = [1, 3, 6] exactly, same order and number of terms).
What is the best way of doing this?
Regards
One way to achieve this is to add a script to the query which would also check the number of elements in the array.
it script would be something like
"filters": [
{
"script": {
"script": "doc['f'].values.length == 3"
}
},
{
"terms": {
"f": [
1,
3,
6
],
"execution": "and"
}
}
]
Hope you get the idea.
I think an even better idea would be to store the array as a string (if there are not many changes to the structure of the graph) and matching the string directly. This would be much faster too.

How to get and compare the elements of the jsonb array in Postgres?

Postgres 9.6.1
CREATE TABLE "public"."test" (
"id" int4 NOT NULL,
"packet" jsonb,
)
WITH (OIDS=FALSE)
;
Jsonb
{"1": {"end": 14876555, "quantity":10}, "2": {"end": 14876555, "quantity":10} }
or
[{"op": 1, "end": 14876555, "quantity": 10}, {"op": 2, "end": 14876555, "quantity": 20}]
All attempts to retrieve an array results in an error:
cannot extract elements from an object
It is necessary to compare all the elements "end" < 1490000 and find the id.
"op": 1 or "1": variable value and the full path is not suitable for solutions
If you have no the agreed JSON structure the best solution IMO is something like
select *
from
public.test,
regexp_matches(packet::text, '"end":\s*(\d+)', 'g') as e(x)
where
x[1]::numeric < 1490000;

How to sum multiple elements of nested arrays on unique keys [duplicate]

This question already has answers here:
How to condense summable metrics to a unique identifier in a ruby table
(3 answers)
Closed 6 years ago.
I have the following defined table. The index for each element in each row corresponds to the same field.
[[123.0, 23,"id1",34, "abc"],
[234.1,43, "id2", 24,"jsk"],
[423.5,53, "id1",1,"xyz"],
[1.4, 5, "id2",0,"klm"]]
In the above example I need to group and sum an output that sums each of the summable elements on the index for the unique identifier in the 3rd column. The result should look like this:
[[546.5,76, "id1",35],
[235.5,48, "id2",24]]
What's the best way to do this?
This is essentially the same as the solution to your previous question.
data = [ [ 123.0, 23, "id1", 34, "abc" ],
[ 234.1, 43, "id2", 24, "jsk" ],
[ 423.5, 53, "id1", 1, "xyz" ],
[ 1.4, 5, "id2", 0, "klm" ] ]
sums = Hash.new {|h,k| h[k] = [0, 0, 0] }
data.each_with_object(sums) do |(val0, val1, id, val2, _), sums|
sums[id][0] += val0
sums[id][1] += val1
sums[id][2] += val2
end
# => { "id1" => [ 546.5, 76, 35 ],
# "id2" => [ 235.5, 48, 24 ] }
The main difference is that instead of giving the Hash a default value of 0, we're giving it a default proc that initializes missing keys with [0, 0, 0]. (We can't just do Hash.new([0, 0, 0]) because then every value would be a reference to a single Array instance, rather than each value having its own Array.) Then, inside the block, we add each value (val0 et al) to the corresponding elements of sums[id].
If you wanted an Array of Arrays instead of a Hash with the id at index 2, then at the end, you would have to add something like this:
.map {|id, vals| vals.insert(2, id) }
However, a Hash with the ids as keys makes more sense as a data structure.

MongoDB aggregate elements of nested arrays

We've got a MongoDB (v2.4) collection that contains time-series snapshots:
{"foo": "bar",
"timeseries": [{"a": 1, "b": 2},
{"a": 2, "b": 3},
...]}
{"foo": "baz",
"timeseries": [{"a": 0, "b": 1},
{"a": 2, "b": 3},
...]}
I need to group all the entries by the foo key, and then sum the the a values of the last entry in each of the timeseries values of each document (timeseries[-1].a, as it were), per key. I want to believe there's some combination of $group, $project, and $unwind that can do what I want without having to resort to mapReduce.
Are you looking for something along the lines of:
> db.collection.aggregate([
{$unwind: "$timeseries"},
{$group: {_id: "$_id", foo: {$last: "$foo"},
last_value: {$last: "$timeseries.a"}}},
{$group: { _id: "$foo", total: { $sum: "$last_value" }}}
])
{ "_id" : "baz", "total" : 2 }
{ "_id" : "bar", "total" : 2 }
the $unwind stage will produce one document per-item in the timeserie
after that, documents are grouped back again, keeping only the $last value
finally, a second group clause will group (and sum values) by the foo field.
As a final note, I don't think this will be time efficient for very long time series as basically MongoDB will have to iterate over all items just in order to reach the last one.

MongoDB - search documents through an integer array

I need to retrieve documents that contain at least one value inside an array. The structure of my document is:
{ "_id": 3,
"username": "111111",
"name": "XPTO 1",
"codes": [ 2, 4, 5 ],
"available": true }
{ "_id": 4,
"username": "22222",
"name": "XPTO 2",
"codes": [ 3, 5 ],
"available": true }
I need to do a find by "codes" and if i search for value "5", i need to retrieve all documents that contains this value inside their array.
I've tried to use #elemMatch but no success...
db.user.find({codes: {"$elemMatch": {codes: [2,8]}}}, {"codes":1})
How can i do this?
Thanks in advance.
You can check for values inside an array just like you compare the values for some field.
So, you would need to do it like this, without using $elemMatch: -
If you want to check whether an array contain a single value 5: -
db.user.find({codes: 5}, {codes:1})
This will return all the document, where codes array contain 5.
If you want to check whether an array contain a value out of given set of values: -
db.user.find({codes: {$in: [2, 8]}}, {codes:1})
This will return documents with array containing either 2 or 8
If you want to check whether an array contain all the values in a list: -
db.user.find({codes: {$all: [2, 5]}}, {codes:1})
This will return all document with array containing both 2 and 5.

Resources