Elasticsearch query: combine nested array of objects into one array - arrays

Using Elasticsearch I am trying to combine a nested array of objects into one array.
This is what my data looks like:
GET invoices/_search
{
"hits": [
{
"_index": "invoices",
"_id": "1234",
"_score": 1.0,
"_source": {
"id": 1234,
"status": "unpaid",
"total": 15.35,
"payments": [
{
"id": 1981,
"amount": 10,
"date": "2022-02-09T13:00:00+01:00"
},
{
"id": 1982,
"amount": 5.35,
"date": "2022-02-09T13:35:00+01:00"
}
]
}
},
# ... More hits
]
}
I want to only get the payments array of each hit combined into one array, so that it returns something like this:
{
"payments": [
{
"id": 1981,
"amount": 10,
"date": "2022-02-09T13:00:00+01:00"
},
{
"id": 1982,
"amount": 5.35,
"date": "2022-02-09T13:35:00+01:00"
},
{
"id": 5658,
"amount": 3,
"date": "2021-12-19T13:00:00+01:00"
}
]
}
I tried to get this result using nested queries but could not figure it out, the query I used is:
# Query I used:
GET invoices/_search
{
"_source": ["payments"],
"query": {
"nested": {
"path": "payments",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "payments.id"
}
}
]
}
}
}
}
}
# Result:
{
"hits": [
{
"_index": "invoices",
"_id": "545960",
"_score": 1.0,
"_source": {
"payments": [
{
"date": "2022-01-22T15:38:15+01:00",
"amount": 374.5,
"id": 320320
},
{
"date": "2022-01-22T15:30:03+01:00",
"amount": 160.5,
"id": 320316
}
]
}
},
{
"_index": "invoices",
"_id": "545961",
"_score": 1.0,
"_source": {
"payments": [
{
"date": "2022-01-22T15:38:15+01:00",
"amount": 12,
"id": 320350
},
{
"date": "2022-01-22T15:30:03+01:00",
"amount": 60.65,
"id": 320379
}
]
}
}
]
}
The result returns only the payments array but divided over multiple hits. How can I combine those arrays?

Related

Reverse a match or match every element in an array where some IDs are not set

Struggling with making this ES query. Basically, I have a nested object, something like:
{
"id": "00000000-0000-0000-0000-000000000000",
"exchangeRate": 0.01,
"payments": [
{
"id": "00000000-0000-0000-0000-000000000000",
"paymentId": "some-id",
"currency": "USD",
"amount": 400.0
},
{
"id": "00000000-0000-0000-0000-000000000000",
"currency": "USD",
"paymentId": "some-id2",
"amount": -200.0
},
{
"id": "00000000-0000-0000-0000-000000000000",
"currency": "USD",
"amount": -200.0
}
]
}
And I want to match on an object where some of the "paymentId" keys are defined, but not all. So the above object would be a match. Whereas something like:
{
"id": "00000000-0000-0000-0000-000000000000",
"exchangeRate": 0.01,
"payments": [
{
"id": "00000000-0000-0000-0000-000000000000",
"paymentId": "some-id",
"currency": "USD",
"amount": 400.0
},
{
"id": "00000000-0000-0000-0000-000000000000",
"currency": "USD",
"paymentId": "some-id2",
"amount": -200.0
},
{
"id": "00000000-0000-0000-0000-000000000000",
"currency": "USD",
"paymentId": "some-id3",
"amount": -200.0
}
Would not match.
I've made a query which matches if all paymendIds are defined and returns all objects where that is true. This query is:
{
"query": {
"bool": {
"must": {
"nested": {
"path": "payments",
"query": {
"exists": {
"field": "payments.paymendIds"
}
}
}
}
}
}
}
The question would be how do I reverse this? So that if it matches this query, it doesn't return as a match. As putting "must_not" simply does the opposite. It returns all records that don't have any paymentIds defined at all. Which is something I want to match on, but I need all the ones that even have only some of the paymentIds set.
You could compare the payments objects' field value sizes one by one while in the nested context.
Assuming both payments.id and payments.paymentId are of the keyword mapping type, you could say:
GET your-index/_search
{
"query": {
"nested": {
"path": "payments",
"query": {
"script": {
"script": "doc['payments.id'].size() != doc['payments.paymentId'].size()"
}
}
}
}
}

ElasticSearch sort by field in NestedObject At First Index Of Array

I am trying to sort a field inside the first object of an array in the following docs
each docs has an array i want to retrieve the docs sorted by they first objects by there city name lets name that in the following result I want to have first the third documents because the name of the city its start by "L" ('london') then the second "M" ('Moscow') then the third "N" ('NYC')
the structure is a record that:
has an array
the array contains an object (called 'address')
the object has a field (called 'city')
i want to sort the docs by the first address.cities
get hello/_mapping
{
"hello": {
"mappings": {
"jack": {
"properties": {
"houses": {
"type": "nested",
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
Thos are the document that i indexed
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "hello",
"_type": "jack",
"_id": "2",
"_score": 1,
"_source": {
"houses": [
{
"address": {
"city": "moscow"
}
},
{
"address": {
"city": "belgrade"
}
},
{
"address": {
"city": "Sacramento"
}
}
]
}
},
{
"_index": "hello",
"_type": "jack",
"_id": "1",
"_score": 1,
"_source": {
"houses": [
{
"address": {
"city": "NYC"
}
},
{
"address": {
"city": "PARIS"
}
},
{
"address": {
"city": "TLV"
}
}
]
}
},
{
"_index": "hello",
"_type": "jack",
"_id": "3",
"_score": 1,
"_source": {
"houses": [
{
"address": {
"city": "London"
}
}
]
}
}
]
}
}
Try this (of course, add some test inside the script if field could be empty. Note it could be pretty slow, because elastic wont have this value indexed. Add a main address field would be faster (and really faster) for sure and would be the good way to do it.
{
"sort" : {
"_script" : {
"script" : "params._source.houses[0].address.city",
"type" : "string",
"order" : "asc"
}
}
}
You have to use _source instead of doc[yourfield] because you dont know in witch order elastic store your array.
EDIT: test if field exist
{
"query": {
"nested": {
"path": "houses",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "houses.address"
}
}
]
}
}
}
},
"sort": {
"_script": {
"script" : "params._source.houses[0].address.city",
"type": "string",
"order": "asc"
}
}
}

loopback Custom order by

I am using angularjs for frontend and loopback for backend and elastic search for the database.
I have a model with properties as:
"name": {
"type": "string",
"required": true
},
"mobileNumber": {
"type": "string",
"required": true
},
"email": {
"type": "string"
},
"message": {
"type": "string",
"required": true
},
"quantity": {
"type": "number",
"required": true
},
"price": {
"type": "number",
"required": true
},
"status": {
"type": "string",
"required": true,
"default": "open"
}
},
data as:
{
"_index": "XXXXXX",
"_type": "XXXXX",
"_id": "XXXXXXX",
"_version": 1,
"_score": 1,
"_source": {
"name": "aadil kirana",
"email": "aadil#gmail.com",
"message": "dfgfb dgfggf",
"quantity": 3434,
"price": 5454,
"status": "open",
"createdAt": "2017-12-19T14:53:41.727Z",
"updatedAt": "2017-12-19T14:53:41.727Z"
}
}
Status could be open, processing, close, reject and failure.
All I want is to get the data in the order where I can see all the open status data ordered by createdAt date,
then all the prcoessing status data ordered by createdAt dat
and so on....
I tried using loopback filters as:
filter = {
order: ['status ASC','createdAt DESC'],
};
but this gives me First all the close status data ordered by date, then all the open status data ordered by date and so on, that status ordered alphabetically.
Please help me to get the desired result.
You can add a new property to your data as statusOrder and define
1 -> open
2 -> close
...
and order by statusOrder instead of status when you are ordering status.
All I want is to get the data in the order where I can see all the
open status data ordered by createdAt date, then all the prcoessing
status data ordered by createdAt dat and so on....
A workaround for this could be to let Elasticsearch do the sort with custom order e.g. in this context the status could be ordered as open followed by processing followed by close followed by reject followed by failure. It can be done with Function Score Query. Some more insights could also be found here
Sample input data for bulk insert:
POST custom/sort/_bulk?pretty
{"index" : {"_index" : "custom"}}
{"status": "open", "createdAt": "2017-12-19T14:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "open", "createdAt": "2017-12-18T14:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "processing", "createdAt": "2017-12-19T14:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "processing", "createdAt": "2017-12-17T14:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "close", "createdAt": "2017-12-19T14:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "close", "createdAt": "2017-12-19T15:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "failure", "createdAt": "2017-12-19T10:53:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "failure", "createdAt": "2017-12-19T14:59:41.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "reject", "createdAt": "2017-12-19T14:53:40.727Z"}
{"index" : {"_index" : "custom"}}
{"status": "reject", "createdAt": "2017-12-19T14:53:41.727Z"}
Sample response from elastic search (without custom order):
Query:
GET custom/sort/_search?filter_path=took,hits.total,hits.hits._score,hits.hits._source
{
"took": 0,
"hits": {
"total": 10,
"hits": [
{
"_score": 1,
"_source": {
"status": "processing",
"createdAt": "2017-12-19T14:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "close",
"createdAt": "2017-12-19T14:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "reject",
"createdAt": "2017-12-19T14:53:40.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "open",
"createdAt": "2017-12-18T14:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "failure",
"createdAt": "2017-12-19T10:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "failure",
"createdAt": "2017-12-19T14:59:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "reject",
"createdAt": "2017-12-19T14:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "open",
"createdAt": "2017-12-19T14:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "processing",
"createdAt": "2017-12-17T14:53:41.727Z"
}
},
{
"_score": 1,
"_source": {
"status": "close",
"createdAt": "2017-12-19T15:53:41.727Z"
}
}
]
}
}
Query to mimic custom ordering :
GET custom/sort/_search?filter_path=took,hits.hits._id,hits.hits._score,hits.hits._source,hits.hits.sort
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"constant_score": {
"filter": {
"terms": {
"status.keyword": [
"open",
"processing",
"close",
"reject",
"failure"
]
}
}
}
},
"functions": [
{
"filter": {
"term": {
"status.keyword": "open"
}
},
"weight": 4
},
{
"filter": {
"term": {
"status.keyword": "processing"
}
},
"weight": 3
},
{
"filter": {
"term": {
"status.keyword": "close"
}
},
"weight": 2
},
{
"filter": {
"term": {
"status.keyword": "reject"
}
},
"weight": 1
},
{
"filter": {
"term": {
"status.keyword": "failure"
}
},
"weight": 0
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
},
"createdAt": {
"order": "asc"
}
}
]
}
Output (with custom order):
{
"took": 4,
"hits": {
"hits": [
{
"_id": "grOucmABwtSchlgLKlaV",
"_score": 4,
"_source": {
"status": "open",
"createdAt": "2017-12-18T14:53:41.727Z"
},
"sort": [
4,
1513608821727
]
},
{
"_id": "gbOucmABwtSchlgLKlaV",
"_score": 4,
"_source": {
"status": "open",
"createdAt": "2017-12-19T14:53:41.727Z"
},
"sort": [
4,
1513695221727
]
},
{
"_id": "hLOucmABwtSchlgLKlaV",
"_score": 3,
"_source": {
"status": "processing",
"createdAt": "2017-12-17T14:53:41.727Z"
},
"sort": [
3,
1513522421727
]
},
{
"_id": "g7OucmABwtSchlgLKlaV",
"_score": 3,
"_source": {
"status": "processing",
"createdAt": "2017-12-19T14:53:41.727Z"
},
"sort": [
3,
1513695221727
]
},
{
"_id": "hbOucmABwtSchlgLKlaV",
"_score": 2,
"_source": {
"status": "close",
"createdAt": "2017-12-19T14:53:41.727Z"
},
"sort": [
2,
1513695221727
]
},
{
"_id": "hrOucmABwtSchlgLKlaV",
"_score": 2,
"_source": {
"status": "close",
"createdAt": "2017-12-19T15:53:41.727Z"
},
"sort": [
2,
1513698821727
]
},
{
"_id": "ibOucmABwtSchlgLKlaV",
"_score": 1,
"_source": {
"status": "reject",
"createdAt": "2017-12-19T14:53:40.727Z"
},
"sort": [
1,
1513695220727
]
},
{
"_id": "irOucmABwtSchlgLKlaV",
"_score": 1,
"_source": {
"status": "reject",
"createdAt": "2017-12-19T14:53:41.727Z"
},
"sort": [
1,
1513695221727
]
},
{
"_id": "h7OucmABwtSchlgLKlaV",
"_score": 0,
"_source": {
"status": "failure",
"createdAt": "2017-12-19T10:53:41.727Z"
},
"sort": [
0,
1513680821727
]
},
{
"_id": "iLOucmABwtSchlgLKlaV",
"_score": 0,
"_source": {
"status": "failure",
"createdAt": "2017-12-19T14:59:41.727Z"
},
"sort": [
0,
1513695581727
]
}
]
}
}

How to sort words by their relevance to a keyword?

I'm currently working on a search using elasticsearch. We have a very large amount of users.
Here is the elasticsearch mapping:
PUT /example_index/_mapping/users
{
"properties": {
"user_autocomplete": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"completion": {
"type": "text",
"analyzer": "user_autocomplete_analyzer",
"search_analyzer": "standard"
}
}
},
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
}
}
Here is the search query.
For example, I get 3 records
GET example_index/users/_search
{
"from": 0,
"size": 3,
"query": {
"query_string": {
"query": "*ro*",
"fields": [
"firstName",
"lastName"
]
}
},
"aggs": {
"user_suggestions": {
"terms": {
"size": 3,
"field": "user_autocomplete.raw"
}
}
}
}
Here is the output of elasticsearch
{
"took": 53,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 13,
"max_score": 1,
"hits": [
{
"_index": "example_index",
"_type": "users",
"_id": "08",
"_score": 1,
"_source": {
"firstName": "Eero",
"lastName": "Saarinen",
"user_autocomplete": "Eero Saarinen"
}
},
{
"_index": "example_index",
"_type": "users",
"_id": "16",
"_score": 1,
"_source": {
"firstName": "Aaron",
"lastName": "Judge",
"user_autocomplete": "Aaron Judge"
}
},
{
"_index": "example_index",
"_type": "users",
"_id": "20",
"_score": 1,
"_source": {
"firstName": "Robert",
"lastName": "Langdon",
"user_autocomplete": "Robert Langdon"
}
}
]
},
"aggregations": {
"user_suggestions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 10,
"buckets": [
{
"key": "Eero Saarinen",
"doc_count": 1
},
{
"key": "Aaron Judge",
"doc_count": 1
},
{
"key": "Robert Langdon",
"doc_count": 1
}
]
}
}
}
I need result like in the following order:
Robert Langdon
Aaron Judge
Eero Saarinen
I have tried order method. It won't work. Is there a way to do this?

Aggregating array of values in elasticsearch

I need to aggregate an array as follows
Two document examples:
{
"_index": "log",
"_type": "travels",
"_id": "tnQsGy4lS0K6uT3Hwzzo-g",
"_score": 1,
"_source": {
"state": "saopaulo",
"date": "2014-10-30T17",
"traveler": "patrick",
"registry": "123123",
"cities": {
"saopaulo": 1,
"riodejaneiro": 2,
"total": 2
},
"reasons": [
"Entrega de encomenda"
],
"from": [
"CompraRapida"
]
}
},
{
"_index": "log",
"_type": "travels",
"_id": "tnQsGy4lS0K6uT3Hwzzo-g",
"_score": 1,
"_source": {
"state": "saopaulo",
"date": "2014-10-31T17",
"traveler": "patrick",
"registry": "123123",
"cities": {
"saopaulo": 1,
"curitiba": 1,
"total": 2
},
"reasons": [
"Entrega de encomenda"
],
"from": [
"CompraRapida"
]
}
},
I want to aggregate the cities array, to find out all the cities the traveler has gone to. I want something like this:
{
"traveler":{
"name":"patrick"
},
"cities":{
"saopaulo":2,
"riodejaneiro":2,
"curitiba":1,
"total":3
}
}
Where the total is the length of the cities array minus 1. I tried the terms aggregation and the sum, but couldn't output the desired output.
Changes in the document structure can be made, so if anything like that would help me, I'd be pleased to know.
in the document posted above "cities" is not a json array , it is a json object.
If changing the document structure is a possibility I would change cities in the document to be an array of object
example document:
cities : [
{
"name" :"saopaulo"
"visit_count" :"2",
},
{
"name" :"riodejaneiro"
"visit_count" :"1",
}
]
You would then need to set cities to be of type nested in the index mapping
"mappings": {
"<type_name>": {
"properties": {
"cities": {
"type": "nested",
"properties": {
"city": {
"type": "string"
},
"count": {
"type": "integer"
},
"value": {
"type": "long"
}
}
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"registry": {
"type": "string"
},
"state": {
"type": "string"
},
"traveler": {
"type": "string"
}
}
}
}
After which you could use nested aggregation to get the city count per user.
The query would look something on these lines :
{
"query": {
"match": {
"traveler": "patrick"
}
},
"aggregations": {
"city_travelled": {
"nested": {
"path": "cities"
},
"aggs": {
"citycount": {
"cardinality": {
"field": "cities.city"
}
}
}
}
}
}

Resources