I have documents like these:
Doc1
{
"id": ...,
...
"articles": [
{
"id": "5cdd17c7e24f6e05d487b2c2#142936",
...
},
{
"id": "5cdd17c7e24f6e05d487b2c2#226536",
...
}
...
}
Doc2
{
"id": ...,
...
"articles": [
{
"id": "5cdd17c7e24f6e05d487b2c2#142936",
...
},
{
"id": "5cdd17c7e24f6e05d487b2c2#226536",
...
},
{
"id": "5cdd17c7e24f6e05d487b2c2#142965",
...
}
...
}
Doc3
{
"id": ...,
...
"articles": [
{
"id": "5cdd17c7e24f6e05d487b2c2#142936",
...
}
...
}
And I want the document exactly has the array of articles I need. For example, if my Array of article Ids is ['5cdd17c7e24f6e05d487b2c2#142936', '5cdd17c7e24f6e05d487b2c2#226536'] I only want to get the Doc1.
Now I have this query:
GET my_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "articles",
"query": {
"query_string": {
"default_field": "articles.id",
"query": "5cdd17c7e24f6e05d487b2c2#142936 AND 5cdd17c7e24f6e05d487b2c2#226536"
}
}
}
}
]
}
}
}
But with this, I get Doc1 & Doc2...
Assuming articles.id is of type keyword, I think this should work for you (not sure it's the most efficient way to write the query):
GET my_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "articles",
"query": {
"term": {
"articles.id": "5cdd17c7e24f6e05d487b2c2#142936"
}
}
}
},
{
"nested": {
"path": "articles",
"query": {
"term": {
"articles.id": "5cdd17c7e24f6e05d487b2c2#226536"
}
}
}
}
],
"must_not": {
"nested": {
"path": "articles",
"query": {
"query_string": {
"default_field": "articles.id",
"query": "NOT 5cdd17c7e24f6e05d487b2c2#142936 AND NOT 5cdd17c7e24f6e05d487b2c2#226536"
}
}
}
}
}
}
}
Related
I'm using a $facet to get an intersection of IDs from two pipelines. Using $group in query_a and query_b in the following pipeline gives the list of IDs.
Pipeline 1:
[
{
"$facet": {
"query_a": [
{
"$match": {
...
}
},
{
"$group": {
"ID": ...
}
}
],
"query_b": [
{
"$match": {
...
}
},
{
"$group": {
"ID": ...
}
}
]
}
},
{
"$project": {
"intersection": {
"$setIntersection": [
"$query_a.ID",
"$query_b.ID"
]
},
"query_a": 1,
"query_b": 1
}
},
{
"$project": {
"_id": 0,
"data": {
"$map": {
"input": "$intersection",
"in": {
"intersection": "$$this",
"query_a": {
"$first": {
"$filter": {
"input": "$query_a",
"as": "item",
"cond": {
"$eq": [
"$$item.ID",
"$$this"
]
}
}
}
},
"query_b": {
"$first": {
"$filter": {
"input": "$query_b",
"as": "item",
"cond": {
"$eq": [
"$$item.ID",
"$$this"
]
}
}
}
}
}
}
}
}
},
{
"$unwind": "$data"
},
{
"$replaceRoot": {
"newRoot": "$data"
}
},
{
"$project": {
"intersection": 1
}
}
]
Example result printed using pymongo:
{"ID": "c80ea2cb-3272-77ae-8f46-d95de600c5bf"}
{"ID": "cdbcc129-548a-9d51-895a-1538200664e6"}
{"ID": "a4ece1ba-42ae-e735-17b0-f619daa506f9"}
...
Changing $group to $project in query_a and query_b, so the list of IDs also includes not distinct values gives an error.
Pipeline 2:
[
{
"$facet": {
"query_a": [
{
"$match": {
...
}
},
{
"$project": {
"ID": ...
}
}
],
"query_b": [
{
"$match": {
...
}
},
{
"$project": {
"ID": ...
}
}
]
}
},
{
"$project": {
"intersection": {
"$setIntersection": [
"$query_a.ID",
"$query_b.ID"
]
},
"query_a": 1,
"query_b": 1
}
},
{
"$project": {
"_id": 0,
"data": {
"$map": {
"input": "$intersection",
"in": {
"intersection": "$$this",
"query_a": {
"$first": {
"$filter": {
"input": "$query_a",
"as": "item",
"cond": {
"$eq": [
"$$item.ID",
"$$this"
]
}
}
}
},
"query_b": {
"$first": {
"$filter": {
"input": "$query_b",
"as": "item",
"cond": {
"$eq": [
"$$item.ID",
"$$this"
]
}
}
}
}
}
}
}
}
},
{
"$unwind": "$data"
},
{
"$replaceRoot": {
"newRoot": "$data"
}
},
{
"$project": {
"intersection": 1
}
}
]
Error:
pymongo.errors.OperationFailure: PlanExecutor error during aggregation :: caused by :: $first's argument must be an array, but is object, full error: {'ok': 0.0, 'errmsg': "PlanExecutor error during aggregation :: caused by :: $first's argument must be an array, but is object"
Running the queries in separate pipelines works using either $group or $project.
Query using $group:
[
{
"$match": {
...
}
},
{
"$group": {
"ID": ...
}
}
]
Example result printed using pymongo:
{"ID": "c80ea2cb-3272-77ae-8f46-d95de600c5bf"}
{"ID": "cdbcc129-548a-9d51-895a-1538200664e6"}
{"ID": "a4ece1ba-42ae-e735-17b0-f619daa506f9"}
...
Query using $project:
[
{
"$match": {
...
}
},
{
"$project": {
"ID": ...
}
}
]
Example result printed using pymongo:
{"ID": "c80ea2cb-3272-77ae-8f46-d95de600c5bf"}
{"ID": "cdbcc129-548a-9d51-895a-1538200664e6"}
{"ID": "a4ece1ba-42ae-e735-17b0-f619daa506f9"}
...
I would appreciate any suggestions!
The problem was that I had to change {"$first": "$data"} to just "$data" when changing from $group to $project.
I have a document like this:
{
"_index": "listings",
"_type": "listing",
"_id": "234",
"_source": {
"category_id": "43608",
"categories": [
43608,
43596
]
}
}
I wanna query to array search category_id in categories. some thing like that
{
"query": {
"bool": {
"must": [
{
"terms": {
"category_id": "doc.categories"
}
}
]
}
}
}
What I supposed to do?
As, category_id is a string type, better to use SHOULD query instead of MUST and Simply, Itrate through the array categories and make separate term level query for each element in array.
{
"query": {
"bool": {
"should": [
{
"term": {
"category_id": "doc.categories[0]"
}
},
{
"term": {
"category_id": "doc.categories[1]"
}
},
...
]
}
}
}
It will return you all which match any of categories array.
You have to user script for find a field value in another field value.
{
"script": {
"script": {
"source": "doc.containsKey('categories') && doc['categories'].values.contains(doc['category_id'].value)",
"lang": "painless"
}
}
}
I have an artillery file where one of my requests is defined as so:
{
"post": {
"url": "/apps/stash/foo/search",
"json": {
"size": 100,
"from": 0,
"query": {
"bool": {
"must": {
"nested": {
"path": "text_analytics.entities.person",
"query": {
"bool": {
"must": {
"match": {
"text_analytics.entities.person.text": "Boris Johnson"
}
}
}
}
}
}
}
}
}
}
}
However, when I run this request, the json that gets sent out is this:
{
"json": {
"size": 100,
"from": 0,
"query": {
"bool": {
"must": {
"nested": {
"path": "text_analytics.entities.person",
"query": {
"bool": {
"must": {
"match": {
"text_analytics.entities.person.text": "Boris Johnson",
"text_analytics": {
"entities": {
"person": {
"text": "Boris Johnson"
}
}
}
}
}
}
}
}
}
}
}
}
}
As you can see, it has added a key to the match object called text_analytics where it has automatically nested objects by splitting on the . character.
How can I stop artillery doing this?
Looks like it's actually a bug in artillery.
https://github.com/artilleryio/artillery/issues/723
I've this entry:
"entries": {
"members": {
"person": [
{
"name": "Jane Doe",
}
]}}
Now I would like to check if the persons array is empty or has some entries.
I already tried with $exists:
"selector": {
"entries": {
"members": {
"person": {
"name": {
"$exists": true
}
}
}
}
}
}
And with $neq
"selector": {
"entries": {
"members": {
"person": {
"name": {
"$neq": ""
}
}
}
}
}
}
Both approaches don't work..any tips?
you may want try using the $size operator. for example,
"selector": {
"entries": {
"members": {
"person": {
"$size": 0
}
}
}
}
I did it with:
"entries.members.person": {
"$elemMatch": {
"name": {
"$exists": true
}
}
}
I've a type in elastic with documents with this structure
{
"name": "Foo Bar",
"myTags": [
{
"id": 3,
"name": "My tag 1"
},
{
"id": 5,
"name": "My Tag 5"
},
{
"id": 7,
"name": "My Tag 7"
}
]
}
Now, given 3 tags I would like to get ALL documents sorted by the number of matching tags. So first the documents that match all 3 tags than those that match 2 then one and finally none.
How can I do this ?
You can do it with function_score:
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "myTags",
"query": {
"term": {
"myTags.name": "My Tag 1"
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "myTags",
"query": {
"term": {
"myTags.name": "My Tag 5"
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "myTags",
"query": {
"term": {
"myTags.name": "My Tag 7"
}
}
}
},
"weight": 1
}
],
"boost_mode": "sum",
"score_mode": "sum"
}
}
}