Match all values in a document array - arrays

Is there a way to match all values in a document array? for eg. if my search array is ["1","2","3","4","5"] and my documents have fields like
doc1: "arr":["1","3","5"]
doc2: "arr":["1","2","7","9"]
doc3: "arr":["1","8"]
Then only the first document should be a match because all the values in the document are present in the search array. I tried using the script filter (to get the length of the array) and tried using the minimum_should_match parameter but I cant get it to work. How do I use a variable created by a script as a parameter for minimum_should_match?

Can't directly search array to check whether contains. Because the analyzer will analysis the search key and match it, if there is any matched key, it will return results.
If want to match array whether contains the specified array, need to split the searched array to multiple terms, like:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"term": {
"number": 1
}
}, {
"term": {
"number": 2
}
}, {
"term": {
"number": 7
}
}, {
"term": {
"number": 9
}
}]
}
}
}
}
}

Related

Elasticsearch match query

I'm searching for some text in a field.
but the problem is whenever two documents contain all search tokens, the document which has more search tokens gets more points instead of the document that has less length.
My ElasticSearch index contains some names of foods. and I wanna search for some food in it.
The documents structure are like this
{"text": "NAME OF FOOD"}
Now I have two documents like
1: {"text": "Apple Syrup Apple Apple Syrup Apple Smoczyk's"}
2: {"text": "Apple Apple"}
If I search using this query
{
"query": {
"match": {
"text": {
"query": "Apple"
}
}
}
}
The first document comes first because contains more Apple in it.
which is not my expected result. I will be good that the second document gets more point because has Apple in it and its length is shorter then first one.
Elastic search scoring gives weightage to term frequency , field length. In general shorter fields are scored higher but term frequency can offset it.
You can use unique filter to generate unique tokens for the text. This way multiple occurrence of same token will not effect the scoring.
Mapping
{
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"unique", "lowercase"
]
}
}
}
}
}
Analyze
GET index29/_analyze
{
"text": "Apple Apple",
"analyzer": "my_analyzer"
}
Result
{
"tokens" : [
{
"token" : "apple",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
Only single token is generated even though apple appears twice.

Search for exact field in an array of strings in elasticsearch

Elasticsearch version: 7.1.1
Hi, I try a lot but could not found any solution
in my index, I have a field which is containing strings.
so, for example, I have two documents containing different values in locations array.
Document 1:
"doc" : {
"locations" : [
"Cloppenburg",
"Berlin"
]
}
Document 2:
"doc" : {
"locations" : [
"Landkreis Cloppenburg",
"Berlin"
]
}
a user requests a search for a term Cloppenburg
and I want to return only those documents which contain term Cloppenburg
and not Landkreis Cloppenburg.
the results should contain only Document-1.
but my query is returning both documents.
I am using the following query and getting both documents back.
can someone please help me out in this.
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"doc.locations": {
"query": "cloppenburg",
"operator": "and"
}
}
}
]
}
}
}
The issue is due to your are using the text field and match query.
Match queries are analyzed and used the same analyzer of search terms which is used at index time, which is a standard analyzer in case of text fields. which breaks text on whitespace on in your case Landkreis Cloppenburg will create two tokens landkreis and cloppenburg both index and search time and even cloppenburg will match the document.
Solution: Use the keyword field.
Index def
{
"mappings": {
"properties": {
"location": {
"type": "keyword"
}
}
}
}
Index your both docs and then use same search query
{
"query": {
"bool": {
"must": [
{
"match": {
"location": {
"query": "Cloppenburg"
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "location",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"location": "Cloppenburg"
}
}
]

Filter All using Elasticsearch

Let's say I have User table with fields like name, address, age, etc. There are more than 1000 records in this table, so I used Elasticsearch to retrieve this data one page at a time, 20 records.
And let's say I just wanted to search for some text "Alexia", so I wanted to display: is there any record contain Alexia? But special thing is that I wanted to search this text via all my fields within the table.
Does search text match the name field or age or address or any? IF it does, it should return values. We are not going to pass any specific field for Elastic query. If it returns more than 20 records matched with my text, the pagination should work.
Any idea of how to do such a query? or any way to connect Elasticsearch?
Yes you can do that by query String
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime** -------> This could be current time or age or any property that like to do a range query
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
For getting only 20 records you can pass the Size as 20 and for Pagination you can use RangeQuery and get the next set of Messages
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gt": 1589570610732. ------------> From previous response
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
You can do the same by using match query as well . If in match query you specify _all it will search in all the fields.
{
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
When you are using ElasticSearch to provide search functionality in search boxes , you should avoid using query_string because it throws error in case of invalid syntax, which other queries return empty result. You can read about this from query_string.
_all is deprecated from ES6.0, so if you are using ES version from 6.x ownwards you can use copy_to to copy all the values of field into single field and then search on that single field. You can refer more from copy_to.
For pagination you can make use of from and size parameter . size parameter tells you how many documents you want to retrieve and from tells from which hit you want to process.
Query :
{
"from" : <current-count>
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
from field value you can set incremently in each iteration to how much much documents you got. For e.g. first iteration you can set from as 0 . For next iteration you can set it as 21 (since in first iteration you got first 20 hits and in second iteration you want to get documents after first 20 hits). You can refer this.

Restheart query for an nested array subdocument

I m working with mongodb and restheart.
In my nosql db i have a unique document with this structure:
{
"_id": "docID",
"users": [
{
"userID": "12",
"elements": [
{
"elementID": "1492446877599",
"events": [
{
"event1": "one"
},
{
"event2": "two",
}
]
}
},
{
"userID": "11",
"elements": [
{
"elementID": "14924",
"events": [
{
"event1": "one"
},
{
"event2": "two",
}
]
}
}
]
}
how can i build an url-query in order to get the user with id 11?
Using mongo shell it should be something like this one:
db.getCollection('collection').find({},{'users':{'$elemMatch':{'userID':'12'}}}).pretty()
I cannot find anything similar on restheart.
Could someone help me?
Using this
http://myHost:port/documents/docID?filter={%27users%27:{%27$elemMatch%27:{%27userID%27:%2712%27}}}
restheart returns me all the documents: userID 11 and 12.
Your request is against a document resource, i.e. the URL is http://myHost:port/documents/docID
The filter query parameter applies for collection requests, i.e. URLs such as http://myHost:port/documents
In any case you need to projection (the keys query parameter) to limit the returned properties.
You should achieve it with the following request (I haven't tried it) using the $elementMatch projection operator:
http://myHost:port/documents?keys={"users":{"$elemMatch":{"userID":"12"}}}

ElasticSearch Painless script: How to iterate in an array of Nested Objects

I am trying to create a script using the script_score of the function_score.
I have several documents whose rankings field is type="nested".
The mapping for the field is:
"rankings": {
"type": "nested",
"properties": {
"rank1": {
"type": "long"
},
"rank2": {
"type": "float"
},
"subject": {
"type": "text"
}
}
}
A sample document is:
"rankings": [
{
"rank1": 1051,
"rank2": 78.5,
"subject": "s1"
},
{
"rank1": 45,
"rank2": 34.7,
"subject": "s2"
}]
What I want to achieve is to iterate over the nested objects of rankings. Actually, I need to use i.e. a for loop in order to find a particular subject and use the rank1, rank2 to compute something.
So far, I use something like this but it does not seem to work (throwing a Compile error):
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"inline":
"sum = 0;"
"for (item in doc['rankings_cug']) {"
"sum = sum + doc['rankings_cug.rank1'].value;"
"}"
}
}
}
I have also tried the following options:
for loop using : instead of in: for (item:doc['rankings']) with no success.
for loop using in but trying to iterate over a specific element of the object, i.e. the rank1: for (item in doc['rankings.rank1'].values), which actually compile but it seems that it finds a zero-length array of rank1.
I have read that _source element is the one which can return JSON-like objects, but as far as I found out it is not supported in Search queries.
Can you please give me some ideas of how to proceed with that?
Thanks a lot.
You can access _source via params._source. This one will work:
PUT /rankings/result/1?refresh
{
"rankings": [
{
"rank1": 1051,
"rank2": 78.5,
"subject": "s1"
},
{
"rank1": 45,
"rank2": 34.7,
"subject": "s2"
}
]
}
POST rankings/_search
POST rankings/_search
{
"query": {
"match": {
"_id": "1"
}
},
"script_fields": {
"script_score": {
"script": {
"lang": "painless",
"inline": "double sum = 0.0; for (item in params._source.rankings) { sum += item.rank2; } return sum;"
}
}
}
}
DELETE rankings
Unfortunately, ElasticSearch scripting in general does not support the ability to access nested documents in this way (including Painless). Perhaps, consider a different structure to your mappings where rankings are stored in multi-valued fields if you need to be able to iterate across them in such a way. Ultimately, the nested data will need to de-normalized and put into the parent documents to be able to gets scores in the way described here.
For Nested objects in an array, iterated over the items and it worked.
Following is my sample data in elasticsearch index:
{
"_index": "activity_index",
"_type": "log",
"_id": "AVjx0UTvgHp45Y_tQP6z",
"_version": 4,
"found": true,
"_source": {
"updated": "2016-12-11T22:56:13.548641",
"task_log": [
{
"week_end_date": "2016-12-11",
"log_hours": 16,
"week_start_date": "2016-12-05"
},
{
"week_start_date": "2016-03-21",
"log_hours": 0,
"week_end_date": "2016-03-27"
},
{
"week_start_date": "2016-04-24",
"log_hours": 0,
"week_end_date": "2016-04-30"
}
],
"created": "2016-12-11T22:56:13.548635",
"userid": 895,
"misc": {
},
"current": false,
"taskid": 1023829
}
}
Here is the "Painless" script to iterate over nested objects:
{
"script": {
"lang": "painless",
"inline":
"boolean contains(def x, def y) {
for (item in x) {
if (item['week_start_date'] == y){
return true
}
}
return false
}
if(!contains(ctx._source.task_log, params.start_time_param) {
ctx._source.task_log.add(params.week_object)
}",
"params": {
"start_time_param": "2016-04-24",
"week_object": {
"week_start_date": "2016-04-24",
"week_end_date": "2016-04-30",
"log_hours": 0
}
}
}
}
Used above script for update: /activity_index/log/AVjx0UTvgHp45Y_tQP6z/_update
In the script, created a function called 'contains' with two arguments. Called the function.
The old groovy style: ctx._source.task_log.contains() will not work since ES 5.X stores nested objects in a separate document. Hope this helps!`

Resources