Elastic: query_string proximity search between two sentences - database

Good morning,
I need to convert this old query (Exalead) to Elasticsearch:
"One Sentence" NEAR Word
I tried with:
"One Sentence" AND "Word"~16
But it interprets this query the same as:
"One Sentence" AND "Word"
and the distance between "One Sentence" and "Word" is more of 16 words.
Then how can i do this query with query_string_query?

You can use intervals query
Returns documents based on the order and proximity of matching terms.
The intervals query uses matching rules, constructed from a small set
of definitions. These rules are then applied to terms from a specified
field.
{
"query": {
"intervals": {
"text": {
"all_of": {
"ordered": false,
"max_gaps": 16, --> distance between both phrases
"intervals": [
{
"match": {
"query": "one sentence",
"max_gaps": 0,--> distance between each phrase
"ordered": false
}
},
{
"match": {
"query": "word",
"max_gaps": 0,
"ordered": false
}
}
]
}
}
}
}
}
Using query string
(Optional, integer) Maximum number of positions allowed between
matching tokens for phrases. Defaults to 0. If 0, exact phrase matches
are required. Transposed terms have a slop of 2.
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "text",
"query": "\"One sentence\""
}
},
{
"query_string": {
"default_field": "text",
"query": "\"sentence word\"",
"phrase_slop": 4
}
}
]
}
}
}

Related

Elasticsearch match query

I'm searching for some text in a field.
but the problem is whenever two documents contain all search tokens, the document which has more search tokens gets more points instead of the document that has less length.
My ElasticSearch index contains some names of foods. and I wanna search for some food in it.
The documents structure are like this
{"text": "NAME OF FOOD"}
Now I have two documents like
1: {"text": "Apple Syrup Apple Apple Syrup Apple Smoczyk's"}
2: {"text": "Apple Apple"}
If I search using this query
{
"query": {
"match": {
"text": {
"query": "Apple"
}
}
}
}
The first document comes first because contains more Apple in it.
which is not my expected result. I will be good that the second document gets more point because has Apple in it and its length is shorter then first one.
Elastic search scoring gives weightage to term frequency , field length. In general shorter fields are scored higher but term frequency can offset it.
You can use unique filter to generate unique tokens for the text. This way multiple occurrence of same token will not effect the scoring.
Mapping
{
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"unique", "lowercase"
]
}
}
}
}
}
Analyze
GET index29/_analyze
{
"text": "Apple Apple",
"analyzer": "my_analyzer"
}
Result
{
"tokens" : [
{
"token" : "apple",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
Only single token is generated even though apple appears twice.

Search for exact field in an array of strings in elasticsearch

Elasticsearch version: 7.1.1
Hi, I try a lot but could not found any solution
in my index, I have a field which is containing strings.
so, for example, I have two documents containing different values in locations array.
Document 1:
"doc" : {
"locations" : [
"Cloppenburg",
"Berlin"
]
}
Document 2:
"doc" : {
"locations" : [
"Landkreis Cloppenburg",
"Berlin"
]
}
a user requests a search for a term Cloppenburg
and I want to return only those documents which contain term Cloppenburg
and not Landkreis Cloppenburg.
the results should contain only Document-1.
but my query is returning both documents.
I am using the following query and getting both documents back.
can someone please help me out in this.
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"doc.locations": {
"query": "cloppenburg",
"operator": "and"
}
}
}
]
}
}
}
The issue is due to your are using the text field and match query.
Match queries are analyzed and used the same analyzer of search terms which is used at index time, which is a standard analyzer in case of text fields. which breaks text on whitespace on in your case Landkreis Cloppenburg will create two tokens landkreis and cloppenburg both index and search time and even cloppenburg will match the document.
Solution: Use the keyword field.
Index def
{
"mappings": {
"properties": {
"location": {
"type": "keyword"
}
}
}
}
Index your both docs and then use same search query
{
"query": {
"bool": {
"must": [
{
"match": {
"location": {
"query": "Cloppenburg"
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "location",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"location": "Cloppenburg"
}
}
]

Filter All using Elasticsearch

Let's say I have User table with fields like name, address, age, etc. There are more than 1000 records in this table, so I used Elasticsearch to retrieve this data one page at a time, 20 records.
And let's say I just wanted to search for some text "Alexia", so I wanted to display: is there any record contain Alexia? But special thing is that I wanted to search this text via all my fields within the table.
Does search text match the name field or age or address or any? IF it does, it should return values. We are not going to pass any specific field for Elastic query. If it returns more than 20 records matched with my text, the pagination should work.
Any idea of how to do such a query? or any way to connect Elasticsearch?
Yes you can do that by query String
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime** -------> This could be current time or age or any property that like to do a range query
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
For getting only 20 records you can pass the Size as 20 and for Pagination you can use RangeQuery and get the next set of Messages
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gt": 1589570610732. ------------> From previous response
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
You can do the same by using match query as well . If in match query you specify _all it will search in all the fields.
{
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
When you are using ElasticSearch to provide search functionality in search boxes , you should avoid using query_string because it throws error in case of invalid syntax, which other queries return empty result. You can read about this from query_string.
_all is deprecated from ES6.0, so if you are using ES version from 6.x ownwards you can use copy_to to copy all the values of field into single field and then search on that single field. You can refer more from copy_to.
For pagination you can make use of from and size parameter . size parameter tells you how many documents you want to retrieve and from tells from which hit you want to process.
Query :
{
"from" : <current-count>
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
from field value you can set incremently in each iteration to how much much documents you got. For e.g. first iteration you can set from as 0 . For next iteration you can set it as 21 (since in first iteration you got first 20 hits and in second iteration you want to get documents after first 20 hits). You can refer this.

Elasticsearch score results based partly on Popularity

I'm using Elasticsearch for this project but a Solr solution might be appropriate too. In the query I'd like to include a portion of a should clause that will return results even if none of the other terms can. This will be used for document popularity. I'll periodically calculate reading popularity and add a float field to each doc with a numeric value.
The idea is to return docs based on terms but when that fails, return popular docs ranked by popularity. These should be ordered by term match scores or magnitude of popularity score.
I realize that I could quantize the popularity and treat it like a tag "hottest", "hotter", "hot"... but would like to use numeric field since the ranking is well defined.
Here is the current form of my data (from fetch by id):
GET /index/docs/ipad
returns a sample object
{
"_index": "index",
"_type": "docs",
"_id": "doc1",
"_version": 1,
"found": true,
"_source": {
"category": ["tablets", "electronics"],
"text": ["buy", "an", "ipad"],
"popularity": 0.95347457,
"id": "doc1"
}
}
Current query format
POST /index/docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{"terms": {"text": ["ipad"]}}
],
"must": [
{"terms": {"category": ["electronics"]}}
]
}
}
}
This may seem an odd query format but these are structured objects, not free form text.
Can I add popularity to this query so that it returns items ranked by popularity magnitude along with those returned by the should terms? I'd boost the actual terms above the popularity so they'd be favored.
Note I do not want to boost by popularity, I want to return popular if the rest of the query returns nothing.
One approach I can think of is wrapping match_all filter in constant score
and using sort on score followed by popularity
example:
{
"size": 10,
"query": {
"bool": {
"should": [
{
"terms": {
"text": [
"ipad"
]
}
},
{
"constant_score": {
"filter": {
"match_all": {}
},
"boost": 0
}
}
],
"must": [
{
"terms": {
"category": [
"electronics"
]
}
}
],
"minimum_should_match": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"popularity": {
"unmapped_type": "double"
}
}
]
}
You want to look into the function score query and a decay function for this.
Here's a gentle intro: https://www.found.no/foundation/function-scoring/

minimum_should_match in connection with dis-max-query

I am a neewby in elasticsearch as I just recently switched from Solr. I am now trying to convert an edismax query from my solrconfig to work with elasticsearch.
Basically I want the terms to be searched in two fields at once, so I ended up with this query:
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"field_1": {
"query": "term1"
}
}
},
{
"match": {
"field_2": {
"query": "term1"
}
}
}
]
}
}
}
This works fine with a single term. In case of multiple terms it's not required that each term is found. However, I could not find a way to implement this feature. That's what the minimum_should_match parameter should be for – which is provided by the bool query.
Let's say I have three terms and I want at least two of them to be found. Via trial and error I checked a lot of variants including something like this:
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"field_1": {
"query": "term1 term2 nonexistingterm",
"operator": "and",
"minimum_should_match": 2
}
}
},
{
"match": {
"field_2": {
"query": "term1 term2 nonexistingterm",
"operator": "and",
"minimum_should_match": 2
}
}
}
]
}
}
}
But minimum_should_match does not work here, and due to the and operator no results are found. It doesn't work with or neither as only one hit is enough to return results.
Can someone help me to combine dis_max with minimum should match?
I could not find any documentation about this, so any hints where I can find these information in elasticsearch myself are very appreciated!
You are using operator: and and providing a minimum_should_match as well, which doesn't make complete sense. With operator: and there, it's producing a query like:
field1:term1 AND field1:term2 AND field1:nonexistingterm
So all of them are required, regardless of what your minimum_should_match is set to. To make effective use of minimum should match behavior, you will need to generate should clauses, with the "or" operator. This is as simple as removing the operator setting, since "or" is the default, and you should see the behavior you are looking for. That is:
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"field_1": {
"query": "term1 term2 nonexistingterm",
"minimum_should_match": 2
}
}
},
{
"match": {
"field_2": {
"query": "term1 term2 nonexistingterm",
"minimum_should_match": 2
}
}
}
]
}
}
}

Resources