Exact string search in array in Elasticsearch

Exact string search in array in Elasticsearch - arrays

I want to search exact string in array.
My data in ES is like below:
{ category": [
"abc test"
],
"es_flag": false,
"bullet_points": [],
"content": "",
"description": false }
I have multiple category like "abc test", "new abc test" etc...
I am trying below query but I am getting multiple category result, I was searching for "abc test" but "new abc test" category is also coming in the result.
{
"from": 0,
"size": 30,
"query": {
"bool" : {
"must": [
{ "match_phrase": { "category": "abc test" } }
]
}
},
"sort": [ { "createdAt": { "order": "desc" } } ]
}
Help will be appreciated.

I'm assuming you are using default analyzer. In that case match_phrase against "field": "abc test" will match all documents which will have the fields with adjacent tokens of abc test, including:
new abc test
abc test new
foo abc test bar
And it will not match:
abc new test - query tokens are not adjacent
test abc - query tokens are adjacent, but in the wrong order
What would actually help you is using the keyword analyzer for your field (you either need to build new index from scratch or update your mappings). If you're building from scrach:
curl -XPUT http://localhost:9200/my_index -d '
{
"mappings": {
"categories": {
"properties": {
"category": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
}'
And afterwards you need to use just simple query, e.g. like this (either match or term will do):
curl -XGET http://localhost:9200/my_index/_search -d '
{
"query": {
"match" : {
"message" : "abc test"
}
}
}'

My version of elasticsearch is 6.0.1. I am using this approach:
GET <your index>/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "category:abc OR category:test"
}
}]
}
},
"sort":[{"createdAt": {
"order": "desc"
}}]
}

Related

Multikey partial index not used with elemMatch

Consider the following document format which has an array field tasks holding embedded documents
{
"foo": "bar",
"tasks": [
{
"status": "sleep",
"id": "1"
},
{
"status": "active",
"id": "2"
}
]
}
There exists a partial index on key tasks.id
{
"v": 2,
"unique": true,
"key": {
"tasks.id": 1
},
"name": "tasks.id_1",
"partialFilterExpression": {
"tasks.id": {
"$exists": true
}
},
"ns": "zardb.quxcollection"
}
The following $elemMatch query with multiple conditions on the same array element
db.quxcollection.find(
{
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
does not seem to use the index
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"tasks": {
"$elemMatch": {
"$and": [{
"id": {
"$eq": "1"
}
},
{
"status": {
"$not": {
"$eq": "active"
}
}
}
]
}
}
},
"direction": "forward"
}
How can I make the above query use the index? The index does seem to be used via dot notation
db.quxcollection.find({"tasks.id": "1"})
however I need the same array element to match multiple conditions which includes the status field, and the following does not seem to be equivalent to the above $elemMatch based query
db.quxcollection.find({
"tasks.id": "1",
"tasks.status": { "$nin": ["active"] }
})

The way the partial indexes work is it uses the path as a key. With $elemMatch you don't have the path explicitly in the query. If you check it with .explain("allPlansExecution") it is not even considered by the query planner.
To benefit from the index you can specify the path in the query:
db.quxcollection.find(
{
"tasks.id": "1",
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
It duplicates part of the elemMatch condition, so the index will be used to get all documents containing tasks of specific id, then it will filter out documents with "active" tasks at fetch stage. I must admit the query doesn't look nice, so may be add some comments to the code with explanations.

ElasticSearch sort array size incoherent results

I am trying to sort by array size in ElasticSearch 7.1.
I indexed the following data without creating any custom mapping:
{
"myarray": [{
"field": {
"value": "test"
}
}]
}
When I look at the mapping, it is giving me:
{
"properties": {
"myarray": {
"properties": {
"field": {
"properties": {
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Now I want to query the index and sort by the highest number of elements in myarray. I have tried doing:
{
"sort": {
"_script": {
"type": "number",
"order": "desc",
"script": "doc.containsKey('myarray.field.value') ? doc['myarray.field.value'].values.size() : 0"
}
}
}
which gives me an error like Fielddata is disabled on text fields by default.[...] Alternatively use a keyword field instead. So I try with
{
"sort": {
"_script": {
"type": "number",
"order": "desc",
"script": "doc.containsKey('myarray.field.value.keyword') ? doc['myarray.field.value.keyword'].values.size() : 0"
}
}
}
which gives me the error Illegal list shortcut value [values].. So then I tried with (removing the values keyword):
{
"sort": {
"_script": {
"type": "number",
"order": "desc",
"script": "doc.containsKey('myarray.field.value.keyword') ? doc['myarray.field.value.keyword'].size() : 0"
}
}
}
and it works, however I have some results that are sorted nicely and suddenly an element that should be at the top appears in the middle.
Is that because it is sorting by the length of the value as a string and not the length of myarray?

This is because text type mapping does not provide sorting, to add sorting you must map the array field with keyword type.
For more info and syntax please refer this : https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-sort.html

JMESPath query for nested array structures

I have the following data structure as a result of aws logs get-query-results:
{
"status": "Complete",
"statistics": {
"recordsMatched": 2.0,
"recordsScanned": 13281.0,
"bytesScanned": 7526096.0
},
"results": [
[
{
"field": "time",
"value": "2019-01-31T21:53:01.136Z"
},
{
"field": "requestId",
"value": "a9c233f7-0b1b-3326-9b0f-eba428e4572c"
},
{
"field": "logLevel",
"value": "INFO"
},
{
"field": "callerId",
"value": "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
}
],
[
{
"field": "time",
"value": "2019-01-25T13:13:01.062Z"
},
{
"field": "requestId",
"value": "a4332628-1b9b-a9c2-0feb-0cd4a3f7cb63"
},
{
"field": "logLevel",
"value": "INFO"
},
{
"field": "callerId",
"value": "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
}
],
]
}
The AWS CLI support JMESPath language for filtering output. I need to apply a query string, to filter among the returned "results" the objects that contain the "callerId" as a "field", retrieve the "value" property and obtain the following output:
[
{
callerId: "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
},
{
callerId: "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
}
]
The first step I do is flatter the results array with the query string: results[]
This will get read of the other root properties (status, statistics) and return only one big array with all of the {field: ..., value: ...} alike objects. But after this I can't manage to properly filter for those objects that match field=="callerId". I tried, among others, the following expressions without success:
'results[][?field=="callerId"]'
'results[][*][?field=="callerId"]'
'results[].{ callerId: #[?field=="callerId"].value }'
I'm not an expert in JMESPath and I was doing the tutorials of the jmespath.org site but couldn't manage to make it work.
Thanks!

Using jq is a good thing because it's more complete language, but if you want to do it with JMES Path here the solution:
results[*][?field=='callerId'].{callerId: value}[]
to get:
[
{
"callerId": "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
},
{
"callerId": "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
}
]

I'm not able to reproduce fully since I don't have the same logs in my log stream but I was able to do this using jq and putting the sample JSON object in a file
cat sample_output.json | jq '.results[][] | select(.field=="callerId") | .value'
OUTPUT:
"a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
"a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
you could pipe the output from the aws cli to jq.
I was able to get pretty close with the native JMESPath query and using the built in editor in this site
http://jmespath.org/examples.html#filtering-and-selecting-nested-data
results[*][?field==`callerId`][]
OUTPUT:
[
{
"field": "callerId",
"value": "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
},
{
"field": "callerId",
"value": "a9b0f9c2-eb42-3986-33f7-8e450b1b72cf"
}
]
but I'm not sure how to get callerId to be the key and the value to be the value from another key.

Is it possible to apply a solr document int field value as boost value if a specific field is matched?

Ex.
"docs": [
{
"id": "f37914",
"index_id": "some_index",
"field_1": [
{
"Some value",
"boost": 20.
}
]
},
]
If 'field_1' is matched, then boost by corresponding 'boost' field.

Boost what? the document? the specific field? you can do any of them.
Anyway the way to do it is to user Function Queries:
https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions
For example if you want to boost the document (and assuming if the value doesn't match then the score is 0) then you can do something like that:
q:_val_:"if(query($q1), field(boost), 0)"&q1=field_1:"Some Value"
_val_ is just a hook into Solr function query, query returns true if q1 matches, field is a simple function that just return the value of the field it self and if allows us to join the two together.

So what I ended up doing is using lucence payloads and solr 6.6 new DelimitedPayloadTokenFilter feature.
First I created a terms field with the following configuration:
{
"add-field-type": {
"name": "terms",
"stored": "true",
"class": "solr.TextField",
"positionIncrementGap": "100",
"indexAnalyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.DelimitedPayloadTokenFilterFactory",
"encoder": "float",
"delimiter": "|"
}
]
},
"queryAnalyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.SynonymGraphFilterFactory",
"ignoreCase": "true",
"expand": "false",
"tokenizerFactory": "solr.KeywordTokenizerFactory",
"synonyms": "synonyms.txt"
}
]
}
},
"add-field" : {
"name":"terms",
"type":"terms",
"stored": "true",
"multiValued": "true"
}
}
I indexed my documents likes so:
[
{
"id" : "1",
"terms" : [
"some term|10.0",
"another term|60.0"
]
}
,
{
"id" : "2",
"terms" : [
"some term|11.0",
"another term|21.0"
]
}
]
I used solr's functional query support to query for a match on terms and grab the attached boost payload and apply it to the relevancy score:
/solr/payloads/select?indent=on&wt=json&q={!payload_score%20f=ai_terms_wtih_synm_3%20v=$payload_term%20func=max}&fl=id,score&payload_term=some+term

CouchDB View With OR Condition

I have two kinds of documents in couchDB with following json type:
1.
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"Vehicle": {
"type": "STRING",
"name": "Vehicle",
"value": "12345"
},
"Start": {
"type": "DATE",
"name": "Start",
"value": "2014-09-10T11:19:00.000Z"
}
}
2.
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"Equipment": {
"type": "STRING",
"name": "Equipment",
"value": "12345"
},
"Start": {
"type": "DATE",
"name": "Start",
"value": "2014-09-10T11:19:00.000Z"
}
}
I want to make one view which search all these documents whose doc.Vehicle.value=12345 OR doc.Equipment.value=12345.
How can I make this view that will return all these kind of documents.
Thanks in advance.

Just emit both (yes, map functions may emits multiple times different key-values for the same doc) values with your view:
function(doc){
if (doc.Equipment) {
emit(doc.Equipment.value, null)
}
if (doc.Vehicle) {
emit(doc.Vehicle.value, null)
}
}
And request them by the same key:
http://localhost:5984/db/_design/ddoc/_view/by_equip_value?key="12345"
See also the Guide to Views for more info about CouchDB views.

With Kxepals Version, you cannot query the type of results ("12345" can be either Vehicle, OR Equipment). you can only see the result when you use "include_docs=true" and search inside the doc, or make a second query with the id of the results.
If you want to see the type (or Query by type) you need to extend the View :
..
if(doc.Equipment) {
emit (doc.Equipment.value,doc.Equipment.name);
}
if(doc.Vehicle) {
emit(doc.Vehicle.value,doc.Vehicle.name);
}
Here, the name is the value of the result rows.
But you can also define the results in the query, if you put the name as a first query item:
if(doc.Equipment) {
emit([doc.Equipment.name,doc.Equipment.value],null);
}
if(doc.Vehicle) {
emit ([doc.Vehicle.name,doc.Vehicle.value],null);
}
Here, the
Your Query for Vehicles:
/viewname?startkey=["Vehicle"]&Endkey=["Vehicle",{}]
Equipment:
/viewname?startkey=["Equipment"]&endkey=["Equipment,{}]
Here, the name is the first Item of the result rows key array.
Maybe this will help : http://de.slideshare.net/okurow/couchdb-mapreduce-13321353
BTW: Better solution would be :
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"type": "Vehicle",
"value":"12345",
"Start": {
"type": "DATE",
"name": "Start", // ? maybe also obsolete, because already inside "Start" Element
"value": "2014-09-10T11:19:00.000Z"
}
}
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"type": "Equipment",
"value":"12345",
"Start": {
"type": "DATE",
"name": "Start", // ? maybe also obsolete, because already inside "Start" Element
"value": "2014-09-10T11:19:00.000Z"
}
}
in this case you can use only one emit:
emit([doc.type,doc.value],null)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Exact string search in array in Elasticsearch - arrays

My version of elasticsearch is 6.0.1. I am using this approach: GET <your index>/_search { "query": { "bool": { "must": [{ "query_string": { "query": "category:abc OR category:test" } }] } }, "sort":[{"createdAt": { "order": "desc" }}] }

Related

Multikey partial index not used with elemMatch

ElasticSearch sort array size incoherent results

JMESPath query for nested array structures

Is it possible to apply a solr document int field value as boost value if a specific field is matched?

CouchDB View With OR Condition

Categories

Resources