minimum_should_match in connection with dis-max-query - solr

I am a neewby in elasticsearch as I just recently switched from Solr. I am now trying to convert an edismax query from my solrconfig to work with elasticsearch.
Basically I want the terms to be searched in two fields at once, so I ended up with this query:
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"field_1": {
"query": "term1"
}
}
},
{
"match": {
"field_2": {
"query": "term1"
}
}
}
]
}
}
}
This works fine with a single term. In case of multiple terms it's not required that each term is found. However, I could not find a way to implement this feature. That's what the minimum_should_match parameter should be for – which is provided by the bool query.
Let's say I have three terms and I want at least two of them to be found. Via trial and error I checked a lot of variants including something like this:
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"field_1": {
"query": "term1 term2 nonexistingterm",
"operator": "and",
"minimum_should_match": 2
}
}
},
{
"match": {
"field_2": {
"query": "term1 term2 nonexistingterm",
"operator": "and",
"minimum_should_match": 2
}
}
}
]
}
}
}
But minimum_should_match does not work here, and due to the and operator no results are found. It doesn't work with or neither as only one hit is enough to return results.
Can someone help me to combine dis_max with minimum should match?
I could not find any documentation about this, so any hints where I can find these information in elasticsearch myself are very appreciated!

You are using operator: and and providing a minimum_should_match as well, which doesn't make complete sense. With operator: and there, it's producing a query like:
field1:term1 AND field1:term2 AND field1:nonexistingterm
So all of them are required, regardless of what your minimum_should_match is set to. To make effective use of minimum should match behavior, you will need to generate should clauses, with the "or" operator. This is as simple as removing the operator setting, since "or" is the default, and you should see the behavior you are looking for. That is:
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"field_1": {
"query": "term1 term2 nonexistingterm",
"minimum_should_match": 2
}
}
},
{
"match": {
"field_2": {
"query": "term1 term2 nonexistingterm",
"minimum_should_match": 2
}
}
}
]
}
}
}

Related

Search for exact field in an array of strings in elasticsearch

Elasticsearch version: 7.1.1
Hi, I try a lot but could not found any solution
in my index, I have a field which is containing strings.
so, for example, I have two documents containing different values in locations array.
Document 1:
"doc" : {
"locations" : [
"Cloppenburg",
"Berlin"
]
}
Document 2:
"doc" : {
"locations" : [
"Landkreis Cloppenburg",
"Berlin"
]
}
a user requests a search for a term Cloppenburg
and I want to return only those documents which contain term Cloppenburg
and not Landkreis Cloppenburg.
the results should contain only Document-1.
but my query is returning both documents.
I am using the following query and getting both documents back.
can someone please help me out in this.
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"doc.locations": {
"query": "cloppenburg",
"operator": "and"
}
}
}
]
}
}
}
The issue is due to your are using the text field and match query.
Match queries are analyzed and used the same analyzer of search terms which is used at index time, which is a standard analyzer in case of text fields. which breaks text on whitespace on in your case Landkreis Cloppenburg will create two tokens landkreis and cloppenburg both index and search time and even cloppenburg will match the document.
Solution: Use the keyword field.
Index def
{
"mappings": {
"properties": {
"location": {
"type": "keyword"
}
}
}
}
Index your both docs and then use same search query
{
"query": {
"bool": {
"must": [
{
"match": {
"location": {
"query": "Cloppenburg"
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "location",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"location": "Cloppenburg"
}
}
]

Multiple match_phrase conditions with another bool in a single ElasticSearch query?

I am trying to conduct an Elasticsearch query that searched a text field ("body") and returns items that match at least one of two multi-word phrases I provide (ie: "stack overflow" OR "the stackoverflow"). I would also like the query to only provide results that occur after a given timestamp, with the results ordered by time.
My current solution is below. I believe the MUST is working correctly (gte a timestamp), but the BOOL + SHOULD with two match_phrases is not correct. I am getting the following error:
Unexpected character ('{' (code 123)): was expecting double-quote to start field name
Which I think is because I have two match_phrases in there?
This is the ES mapping and the details of the ES API I am using details are here.
{"query":
{"bool":
{"should":
[{"match_phrase":
{"body":"a+phrase"}
},
{"match_phrase":
{"body":"another+phrase"}
}
]
},
{"bool":
{"must":
[{"range":
{"created_at:
{"gte":"thispage"}
}
}
]}
}
},"size":10000,
"sort":"created_at"
}
I think you were just missing a single " after created_at.
{
"query": {
"bool": {
"must": [
{
"range": {
"created_at": {
"gte": "1534004694"
}
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"body": "a+phrase"
}
},
{
"match_phrase": {
"body": "another+phrase"
}
}
]
}
}
]
}
},
"size": 10,
"sort": "created_at"
}
Also, you are allowed to have both must and should as properties of a bool object, so this is also worth trying.
{
"query": {
"bool": {
"must": {
"range": {
"created_at": {
"gte": "1534004694"
}
}
},
"should": [
{
"match_phrase": {
"body": "a+phrase"
}
},
{
"match_phrase": {
"body": "another+phrase"
}
}
]
}
},
"size": 10,
"sort": "created_at"
}
On a side note, Postman or any JSON formatter/validator would really help in determining where the error is.

Elastic Search query to find documents where nested field "contains" X objects

This is an example of what my data looks like for an Elastic Search index called video_service_inventory:
{
'video_service': 'netflix',
'movies' : [
{'title': 'Mission Impossible', 'genre: 'action'},
{'title': 'The Hangover', 'genre': 'comedy'},
{'title': 'Zoolander', 'genre': 'comedy'},
{'title': 'The Ring', 'genre': 'horror'}
]
}
I have established in my index that the "movies" field is of type "nested"
I want to write a query that says "get me all video_services that contain both of these movies":
{'title': 'Mission Impossible', 'genre: 'action'}
AND
{'title': 'The Ring', 'genre': 'horror'}
where, the title and genre must match. If one movie exists, but not the other, I don't want the query to return that video service.
Ideally, I would like to do this in 1 query. So far, I haven't been able to find a solution.
Anyone have suggestions for writing this search query?
the syntax may vary depending on elasticsearch version, but in general you should combine multiple nested queries within a bool - must query. For nested queries you need to specify path to "navigate" to the nested documents, and you need to qualify the properties with the part + the field name:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "movies",
"query": {
"bool": {
"must": [
{ "terms": { "movies.title": "Mission Impossible" } },
{ "terms": { "movies.genre": "action" } }
]
}
}
}
},
{
"nested": {
"path": "movies",
"query": {
"bool": {
"must": [
{ "terms": { "movies.title": "The Ring" } },
{ "terms": { "movies.genre": "horror" } }
]
}
}
}
}
]
}
}
}
This example assumes that the title and genre fields are not analyzed properties. In newer versions of elasticsearch you may find them as a .keyword field, and you would then use "movies.genre.keyword" to query on the not analyzed version of the data.¨
For details on bool queries you can have a look at the documentation on the ES website:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
For nested queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Achieve case insensitivity for cloudant queries

My query has the following selector,
{
"selector": {
"_id": {
"$gt": null
},
"series": {
"$regex": "(?i)mario"
}
}
}
Now, if I have a document with series = mario12, the above query is returning this document which shouldn't happen. I want my query to only ignore the case of "mario".
How can I achieve case insensitivity?
I'm not sure I understand the question exactly. If you only want to match the full word "mario" in a case-insensitive manner then you would use a selector like this:
{
"selector": {
"_id": {
"$gt": null
},
"series": {
"$regex": "^(?i)mario$"
}
}
}
This will match "mario", "Mario", "MARIO", etc, but will not match "mario12", "Mario12", "12Mario", etc.

Cloudant find Query with $and and $or elements

I'm using the following json to find results in a Cloudant
{
"selector": {
"$and": [
{
"type": {
"$eq": "sensor"
}
},
{
"v": {
"$eq": 2355
}
},
{
"$or": [
{
"p": "#401000103"
},
{
"p": "#401000114"
}
]
},
{
"t_max": {
"$gte": 1459554894
}
},
{
"t_min": {
"$lte": 1459509591
}
}
]
},
"fields": [
"_id",
"p"
],
"limit": 200
}
If I run this againt my cloudant database I get the following error:
{
"error": "unknown_error",
"reason": "function_clause",
"ref": 3379914628
}
If I remove one the $or elements I get the results for query.
(,{"p":"#401000114"})
Also i get a result if I replace #401000114 with #401000114 I get result.
But when I want to use both element I get the error code above.
Can anybody tell what this error_reason: function_clause mean?
error_reason: function_clause means there was a problem on the server, you should probably reach out to Cloudant Support and see if they can help you with your issue.
I had contact with the Cloudant support.
This is there answer:
The issue affects Cloudant generally
It affects both mult-tenant and dedicated clusters.
There are working on the sollution.
A workaround is in the array to which the $or operator applies has two elements, you can get the correct result by repeating one of the items in the array.

Resources