Elasticsearch: sort by max value in array - arrays

Let's say I have 2 documents:
{
"id": "1234",
"things": [
{
"datetime": "2016-01-01T12:00:00+03:00"
},
{
"datetime": "2016-01-06T12:00:00+03:00"
},
{
"datetime": "2100-01-01T12:00:00+03:00"
}
]
}
and
{
"id": "5678",
"things": [
{
"datetime": "2016-01-03T12:00:00+03:00"
},
{
"datetime": "2100-01-06T12:00:00+03:00"
}
]
}
things.datetime is mapped as { "type": "date", "format": "date_time_no_millis" }.
I want to sort these documents based on the latest things.datetime value that is not in the future.
I.e. sorted by simply the max things.datetime would use the dates 2100-01-01T12:00:00+03:00 and 2100-01-06T12:00:00+03:00. I want the sorting to be based on the values 2016-01-06T12:00:00+03:00 and 2016-01-03T12:00:00+03:00.
How can I achieve this, using ElasticSearch 2.x?
I've tried:
"sort": {
"things.datetime": {
"order": "desc",
"mode": "max"
}
}
But that doesn't seem to sort even by the 2100 dates.
I also tried to use nested_filter like so:
"sort": {
"things.datetime": {
"order": "desc",
"mode": "max",
"nested_filter": {
"range": {
"things.datetime": { "lte": "now" }
}
}
}
}
But it doesn't work as I'd expect.
Also the "sort" value in the response is a negative number. So for a document with dates:
"2015-10-24T05:50:00+03:00",
"2015-10-26T22:05:48+02:00",
"2015-10-24T08:05:43+03:00"
gets a negative sort value:
"sort": [
-9223372036854775808
]

The correct way to achieve this seems to be:
"sort": {
"things.datetime": {
"order": "desc",
"mode": "max",
"nested_path": "things",
"nested_filter": {
"range": {
"things.datetime": { "lte": "now" }
}
}
}
}
When there are no more dates left after the nested_filter, the sort value becomes a negative number to ensure the correct order.

Related

find overlapping dates within mongoDB array objects

I have a MongoDB document collection with multiple arrays that looks like this :
{
"_id": "1235847",
"LineItems": [
{
"StartDate": ISODate("2017-07-31T00:00:00.000+00:00"),
"EndDate": ISODate("2017-09-19T00:00:00.000+00:00"),
"Amount": {"$numberDecimal": "0.00"}
},
{
"StartDate": ISODate("2022-03-20T00:00:00.000+00:00"),
"EndDate": ISODate("2022-10-21T00:00:00.000+00:00"),
"Amount": {"$numberDecimal": "6.38"}
},
{
"StartDate": ISODate("2022-09-20T00:00:00.000+00:00"),
"EndDate": ISODate("9999-12-31T00:00:00.000+00:00"),
"Amount": {"$numberDecimal": "6.17"}
}
]
}
Is there a simple way to find documents where the startdate has overlapped with previously startdate, enddate?
The startdate can not be before previous end dates within the array
The start/end can not be between previous start/end dates within the array
The below works but I don't want to hardcode the array index to find all the documents
{
$match: {
$expr: {
$gt: [
'LineItems.3.EndDate',
'LineItems.2.StartDate'
]
}
}
}
Here's one way you could find docs where "StartDate" is earlier than the immediately previous "EndDate".
db.collection.find({
"$expr": {
"$getField": {
"field": "overlapped",
"input": {
"$reduce": {
"input": {"$slice": ["$LineItems", 1, {"$size": "$LineItems"}]},
"initialValue": {
"overlapped": false,
"prevEnd": {"$first": "$LineItems.EndDate"}
},
"in": {
"overlapped": {
"$or": [
"$$value.overlapped",
{"$lt": ["$$this.StartDate", "$$value.prevEnd"]}
]
},
"prevEnd": "$$this.EndDate"
}
}
}
}
}
})
Try it on mongoplayground.net.

Multikey partial index not used with elemMatch

Consider the following document format which has an array field tasks holding embedded documents
{
"foo": "bar",
"tasks": [
{
"status": "sleep",
"id": "1"
},
{
"status": "active",
"id": "2"
}
]
}
There exists a partial index on key tasks.id
{
"v": 2,
"unique": true,
"key": {
"tasks.id": 1
},
"name": "tasks.id_1",
"partialFilterExpression": {
"tasks.id": {
"$exists": true
}
},
"ns": "zardb.quxcollection"
}
The following $elemMatch query with multiple conditions on the same array element
db.quxcollection.find(
{
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
does not seem to use the index
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"tasks": {
"$elemMatch": {
"$and": [{
"id": {
"$eq": "1"
}
},
{
"status": {
"$not": {
"$eq": "active"
}
}
}
]
}
}
},
"direction": "forward"
}
How can I make the above query use the index? The index does seem to be used via dot notation
db.quxcollection.find({"tasks.id": "1"})
however I need the same array element to match multiple conditions which includes the status field, and the following does not seem to be equivalent to the above $elemMatch based query
db.quxcollection.find({
"tasks.id": "1",
"tasks.status": { "$nin": ["active"] }
})
The way the partial indexes work is it uses the path as a key. With $elemMatch you don't have the path explicitly in the query. If you check it with .explain("allPlansExecution") it is not even considered by the query planner.
To benefit from the index you can specify the path in the query:
db.quxcollection.find(
{
"tasks.id": "1",
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
It duplicates part of the elemMatch condition, so the index will be used to get all documents containing tasks of specific id, then it will filter out documents with "active" tasks at fetch stage. I must admit the query doesn't look nice, so may be add some comments to the code with explanations.

How to find maximum value from array inside array of objects in mongodb

{
"_id": {
"$__id": "608028497a90cf06c02b1083"
},
"name": "Player Unknown's Batteground Mobile",
"publisher_detail": "Bluehole Corporation",
"release_date": {
"$date": "2017-03-26T18:30:00.000Z"
},
"version": "1.2.0",
"genre": "action",
"rating": 100,
"achievement": [{
"name": "Ace",
"players": [{
"player_name": "notAplayer",
"score": 60,
"date_of_achievement": {
"$date": "2019-02-14T18:30:00.000Z"
},
{
"player_name": "notAplayer2",
"score": 92,
"date_of_achievement": {
"$date": "2020-04-14T18:30:00.000Z"
}
}]
}]
}
I have the following mongodb schema for a gaming system.I want to write a query to find maximum score in each game. Not able to figure out what to do!
$map to iterate loop of achievement.players.score and get max from array of array
$max to get maximum number from array
db.collection.aggregate([
{
$addFields: {
maxScore: {
$max: {
$map: {
input: "$achievement.players.score",
in: { $max: "$$this" }
}
}
}
}
}
])
Playground
Not fully clear what you mean by max score in each game (what is a "game") but a simple solution is this:
db.collection.aggregate([
{
$addFields: {
max_score: {
$max: {
$max: "$achievement.players.score"
}
}
}
}
])

Is it possible to apply a solr document int field value as boost value if a specific field is matched?

Ex.
"docs": [
{
"id": "f37914",
"index_id": "some_index",
"field_1": [
{
"Some value",
"boost": 20.
}
]
},
]
If 'field_1' is matched, then boost by corresponding 'boost' field.
Boost what? the document? the specific field? you can do any of them.
Anyway the way to do it is to user Function Queries:
https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions
For example if you want to boost the document (and assuming if the value doesn't match then the score is 0) then you can do something like that:
q:_val_:"if(query($q1), field(boost), 0)"&q1=field_1:"Some Value"
_val_ is just a hook into Solr function query, query returns true if q1 matches, field is a simple function that just return the value of the field it self and if allows us to join the two together.
So what I ended up doing is using lucence payloads and solr 6.6 new DelimitedPayloadTokenFilter feature.
First I created a terms field with the following configuration:
{
"add-field-type": {
"name": "terms",
"stored": "true",
"class": "solr.TextField",
"positionIncrementGap": "100",
"indexAnalyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.DelimitedPayloadTokenFilterFactory",
"encoder": "float",
"delimiter": "|"
}
]
},
"queryAnalyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.SynonymGraphFilterFactory",
"ignoreCase": "true",
"expand": "false",
"tokenizerFactory": "solr.KeywordTokenizerFactory",
"synonyms": "synonyms.txt"
}
]
}
},
"add-field" : {
"name":"terms",
"type":"terms",
"stored": "true",
"multiValued": "true"
}
}
I indexed my documents likes so:
[
{
"id" : "1",
"terms" : [
"some term|10.0",
"another term|60.0"
]
}
,
{
"id" : "2",
"terms" : [
"some term|11.0",
"another term|21.0"
]
}
]
I used solr's functional query support to query for a match on terms and grab the attached boost payload and apply it to the relevancy score:
/solr/payloads/select?indent=on&wt=json&q={!payload_score%20f=ai_terms_wtih_synm_3%20v=$payload_term%20func=max}&fl=id,score&payload_term=some+term

Get object hit when searching array in Elasticsearch

I am trying to get an object out of a JSON array that is stored in elasticsearch. The layout is like this:
[
object{}
object{}
object{}
]
What I need for when I do a search and it hits on one of these objects, to get the specific object it matches to. Currently, using the java API I am searching with:
QueryBuilder qb = QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("text", "pottery").boost(5)
.minimumShouldMatch("1"));
SearchResponse response = client.prepareSearch("stuff")
.setTypes("things")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(qb)
.setPostFilter(filter)//.setHighlighterQuery(qb)
.addField("places.numbers")
.addField("name")
.addField("city")
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
But this will just return the whole object that I hit or when I tell it to return the field "places.numbers" it will only return the first object in the "palces" array, not the one that was matched in the query.
Thank you for any help!
There are a couple of ways to handle this. I would probably do it with a nested type and inner hits, given what you've shown in your question, but it could also probably be done with the parent/child relationship.
Here is an example with nested docs. I set up a simple index like this:
PUT /test_index
{
"mappings": {
"parent_doc": {
"properties": {
"parent_name": {
"type": "string"
},
"nested_docs": {
"type": "nested",
"properties": {
"nested_name": {
"type": "string"
}
}
}
}
}
}
}
Then added a couple of simple documents:
POST /test_index/parent_doc/_bulk
{"index":{"_id":1}}
{"parent_name":"p1","nested_docs":[{"nested_name":"n1"},{"nested_name":"n2"}]}
{"index":{"_id":2}}
{"parent_name":"p2","nested_docs":[{"nested_name":"n3"},{"nested_name":"n4"}]}
And now I can search like this, using "inner_hits":
POST /test_index/_search
{
"query": {
"nested": {
"path": "nested_docs",
"query": {
"match": {
"nested_docs.nested_name": "n3"
}
},
"inner_hits" : {}
}
}
}
which returns:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.098612,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "2",
"_score": 2.098612,
"_source": {
"parent_name": "p2",
"nested_docs": [
{
"nested_name": "n3"
},
{
"nested_name": "n4"
}
]
},
"inner_hits": {
"nested_docs": {
"hits": {
"total": 1,
"max_score": 2.098612,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "2",
"_nested": {
"field": "nested_docs",
"offset": 0
},
"_score": 2.098612,
"_source": {
"nested_name": "n3"
}
}
]
}
}
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/ef7debf436fec2a10097ba2106d5ff30ff8d7c77

Resources