Search for exact field in an array of strings in elasticsearch - arrays

Elasticsearch version: 7.1.1
Hi, I try a lot but could not found any solution
in my index, I have a field which is containing strings.
so, for example, I have two documents containing different values in locations array.
Document 1:
"doc" : {
"locations" : [
"Cloppenburg",
"Berlin"
]
}
Document 2:
"doc" : {
"locations" : [
"Landkreis Cloppenburg",
"Berlin"
]
}
a user requests a search for a term Cloppenburg
and I want to return only those documents which contain term Cloppenburg
and not Landkreis Cloppenburg.
the results should contain only Document-1.
but my query is returning both documents.
I am using the following query and getting both documents back.
can someone please help me out in this.
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"doc.locations": {
"query": "cloppenburg",
"operator": "and"
}
}
}
]
}
}
}

The issue is due to your are using the text field and match query.
Match queries are analyzed and used the same analyzer of search terms which is used at index time, which is a standard analyzer in case of text fields. which breaks text on whitespace on in your case Landkreis Cloppenburg will create two tokens landkreis and cloppenburg both index and search time and even cloppenburg will match the document.
Solution: Use the keyword field.
Index def
{
"mappings": {
"properties": {
"location": {
"type": "keyword"
}
}
}
}
Index your both docs and then use same search query
{
"query": {
"bool": {
"must": [
{
"match": {
"location": {
"query": "Cloppenburg"
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "location",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"location": "Cloppenburg"
}
}
]

Related

Multikey partial index not used with elemMatch

Consider the following document format which has an array field tasks holding embedded documents
{
"foo": "bar",
"tasks": [
{
"status": "sleep",
"id": "1"
},
{
"status": "active",
"id": "2"
}
]
}
There exists a partial index on key tasks.id
{
"v": 2,
"unique": true,
"key": {
"tasks.id": 1
},
"name": "tasks.id_1",
"partialFilterExpression": {
"tasks.id": {
"$exists": true
}
},
"ns": "zardb.quxcollection"
}
The following $elemMatch query with multiple conditions on the same array element
db.quxcollection.find(
{
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
does not seem to use the index
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"tasks": {
"$elemMatch": {
"$and": [{
"id": {
"$eq": "1"
}
},
{
"status": {
"$not": {
"$eq": "active"
}
}
}
]
}
}
},
"direction": "forward"
}
How can I make the above query use the index? The index does seem to be used via dot notation
db.quxcollection.find({"tasks.id": "1"})
however I need the same array element to match multiple conditions which includes the status field, and the following does not seem to be equivalent to the above $elemMatch based query
db.quxcollection.find({
"tasks.id": "1",
"tasks.status": { "$nin": ["active"] }
})
The way the partial indexes work is it uses the path as a key. With $elemMatch you don't have the path explicitly in the query. If you check it with .explain("allPlansExecution") it is not even considered by the query planner.
To benefit from the index you can specify the path in the query:
db.quxcollection.find(
{
"tasks.id": "1",
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
It duplicates part of the elemMatch condition, so the index will be used to get all documents containing tasks of specific id, then it will filter out documents with "active" tasks at fetch stage. I must admit the query doesn't look nice, so may be add some comments to the code with explanations.

Filter All using Elasticsearch

Let's say I have User table with fields like name, address, age, etc. There are more than 1000 records in this table, so I used Elasticsearch to retrieve this data one page at a time, 20 records.
And let's say I just wanted to search for some text "Alexia", so I wanted to display: is there any record contain Alexia? But special thing is that I wanted to search this text via all my fields within the table.
Does search text match the name field or age or address or any? IF it does, it should return values. We are not going to pass any specific field for Elastic query. If it returns more than 20 records matched with my text, the pagination should work.
Any idea of how to do such a query? or any way to connect Elasticsearch?
Yes you can do that by query String
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime** -------> This could be current time or age or any property that like to do a range query
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
For getting only 20 records you can pass the Size as 20 and for Pagination you can use RangeQuery and get the next set of Messages
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gt": 1589570610732. ------------> From previous response
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
You can do the same by using match query as well . If in match query you specify _all it will search in all the fields.
{
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
When you are using ElasticSearch to provide search functionality in search boxes , you should avoid using query_string because it throws error in case of invalid syntax, which other queries return empty result. You can read about this from query_string.
_all is deprecated from ES6.0, so if you are using ES version from 6.x ownwards you can use copy_to to copy all the values of field into single field and then search on that single field. You can refer more from copy_to.
For pagination you can make use of from and size parameter . size parameter tells you how many documents you want to retrieve and from tells from which hit you want to process.
Query :
{
"from" : <current-count>
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
from field value you can set incremently in each iteration to how much much documents you got. For e.g. first iteration you can set from as 0 . For next iteration you can set it as 21 (since in first iteration you got first 20 hits and in second iteration you want to get documents after first 20 hits). You can refer this.

Achieve case insensitivity for cloudant queries

My query has the following selector,
{
"selector": {
"_id": {
"$gt": null
},
"series": {
"$regex": "(?i)mario"
}
}
}
Now, if I have a document with series = mario12, the above query is returning this document which shouldn't happen. I want my query to only ignore the case of "mario".
How can I achieve case insensitivity?
I'm not sure I understand the question exactly. If you only want to match the full word "mario" in a case-insensitive manner then you would use a selector like this:
{
"selector": {
"_id": {
"$gt": null
},
"series": {
"$regex": "^(?i)mario$"
}
}
}
This will match "mario", "Mario", "MARIO", etc, but will not match "mario12", "Mario12", "12Mario", etc.

"There is no index available for this selector" despite the fact I made one

In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!

Elasticsearch score results based partly on Popularity

I'm using Elasticsearch for this project but a Solr solution might be appropriate too. In the query I'd like to include a portion of a should clause that will return results even if none of the other terms can. This will be used for document popularity. I'll periodically calculate reading popularity and add a float field to each doc with a numeric value.
The idea is to return docs based on terms but when that fails, return popular docs ranked by popularity. These should be ordered by term match scores or magnitude of popularity score.
I realize that I could quantize the popularity and treat it like a tag "hottest", "hotter", "hot"... but would like to use numeric field since the ranking is well defined.
Here is the current form of my data (from fetch by id):
GET /index/docs/ipad
returns a sample object
{
"_index": "index",
"_type": "docs",
"_id": "doc1",
"_version": 1,
"found": true,
"_source": {
"category": ["tablets", "electronics"],
"text": ["buy", "an", "ipad"],
"popularity": 0.95347457,
"id": "doc1"
}
}
Current query format
POST /index/docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{"terms": {"text": ["ipad"]}}
],
"must": [
{"terms": {"category": ["electronics"]}}
]
}
}
}
This may seem an odd query format but these are structured objects, not free form text.
Can I add popularity to this query so that it returns items ranked by popularity magnitude along with those returned by the should terms? I'd boost the actual terms above the popularity so they'd be favored.
Note I do not want to boost by popularity, I want to return popular if the rest of the query returns nothing.
One approach I can think of is wrapping match_all filter in constant score
and using sort on score followed by popularity
example:
{
"size": 10,
"query": {
"bool": {
"should": [
{
"terms": {
"text": [
"ipad"
]
}
},
{
"constant_score": {
"filter": {
"match_all": {}
},
"boost": 0
}
}
],
"must": [
{
"terms": {
"category": [
"electronics"
]
}
}
],
"minimum_should_match": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"popularity": {
"unmapped_type": "double"
}
}
]
}
You want to look into the function score query and a decay function for this.
Here's a gentle intro: https://www.found.no/foundation/function-scoring/

Resources