Filter on array of objects - arrays

In elasticsearch we have a type which has an array of objects. When trying to access from Kibana I am getting some inconsistencies while accessing
Here is an extract from my mapping,
{
"myIndex-2017.08.22": {
"mappings": {
"typeA": {
"properties": {
.
.
.
"Collection": {
"properties": {
.
.
.
"FileType": {
"type": "text"
}
}
}
}
}
}
}
}
Here I can have multiple objects in the Collection i.e., indexing it as an array. When I try too query using one FileType for example FileType: DOCX then I get some records with FileType as HTML as well.
When looking deeper I found that it is because some of the records which has two collection elements one with FileType: DOCX and one with FileType: HTML.
Why is filtering working like this ? Is there any other way to filter and get only FileType: DOCX and not display FileType: HTML.
Am running ES 5.3.

Elasticsearch flattens array fields out of the box, so
{
"files" : [
{
"name" : "name1",
"fileType" : "doc"
},
{
"name" : "name2",
"fileType" : "html"
}
]}
becomes:
{
"files.name" : [ "name1", "name2" ],
"files.fileType" : [ "doc", "html" ]
}
If you want to search for the objects itself in this array you have to use the nested datatype in the mapping of the collection:
{
"myIndex-2017.08.22": {
"mappings": {
"typeA": {
"properties": {
.
.
.
"Collection": {
"type": "nested",
"properties": {
.
.
.
"FileType": {
"type": "text"
}
}
}
}
}
}
}
}

Related

MongoDB - How to modify the "key" element in the document

I am having the below document structure:
[
{
"network_type": "ex",
"rack": [
{
"xxxx": {
"asn": 111111,
"nodes": {
"business": [
"sk550abcc1eb01.abc.com",
"sk550abcc1eb10.abc.com",
"sk550abcc1eb19.abc.com",
"sk550abcc1eb28.abc.com"
]
},
"region": "ex-01",
"zone": "01a"
}
}
]
}
]
I need to rename/update the key array element "xxxx" to "details".
I tried the below command, but it doesn't seem to work.
db.collection.update({},
{
$rename: {
"rack.xxxx": "details"
}
})
Link: https://mongoplayground.net/p/9dcDP-VKZ55
Please help me.
You can't direct $rename the field name which is within the array.
Instead,
Iterate with document(s) in the rank array, create the details field with the value of xxxx and next append this field to each document.
Remove the path with $rank.xxxx to remove the xxxx field from the document(s) in the rank array.
db.collection.update({},
[
{
$set: {
rack: {
$map: {
input: "$rack",
in: {
$mergeObjects: [
{
"details": "$$this.xxxx"
},
"$$this"
]
}
}
}
}
},
{
$unset: "rack.xxxx"
}
])
Sample Mongo Playground

elasticsearch how to use exact search and ignore the keyword special characters in keywords?

i had some id value (numeric and text combination) in my elasticsearch index, and in my program user might will input some special characters in search keyword.
and i want to know is there anyway that can let elasticsearch to use exact search and also can remove some special characters in search keywork
i already use custom analyzer to split search keyword by some special characters. and use query->match to search data, and i still got no results
data
{
"_index": "testdata",
"_type": "_doc",
"_id": "11112222",
"_source": {
"testid": "1MK444750"
}
}
custom analyzer
"analysis" : {
"analyzer" : {
"testidanalyzer" : {
"pattern" : """([^\w\d]+|_)""",
"type" : "pattern"
}
}
}
mapping
{
"article" : {
"mappings" : {
"_doc" : {
"properties" : {
"testid" : {
"type" : "text",
"analyzer" : "testidanalyzer"
}
}
}
}
}
}
here's my elasticsearch query
GET /testdata/_search
{
"query": {
"match": {
// "testid": "1MK_444-750" // no result
"testid": "1MK444750"
}
}
}
and analyzer successfully seprated separated my keyword, but i just can't match anything in result
POST /testdata/_analyze
{
"analyzer": "testidanalyzer",
"text": "1MK_444-750"
}
{
"tokens" : [
{
"token" : "1mk",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "444",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 1
},
{
"token" : "750",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 2
}
]
}
please help, thanks in advance!
First off, you should probably model the testid field as keyword rather than text, it's a more appropriate data type.
You want to put in a feature whereby some characters (_, -) are effectively ignored at search time. You can achieve this by giving your field a normalizer, which tells Elasticsearch how to preprocess data for this field prior to indexing or searching. Specifically, you can declare a mapping char filter in your normalizer that replaces these characters with an empty string.
This is how all these changes would fit into your mapping:
PUT /testdata
{
"settings": {
"analysis": {
"char_filter": {
"mycharfilter": {
"type": "mapping",
"mappings": [
"_ => ",
"- => "
]
}
},
"normalizer": {
"mynormalizer": {
"type": "custom",
"char_filter": [
"mycharfilter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"testid" : {
"type" : "keyword",
"normalizer" : "mynormalizer"
}
}
}
}
}
The following searches would then produce the same results:
GET /testdata/_search
{
"query": {
"match": {
"testid": "1MK444750"
}
}
}
GET /testdata/_search
{
"query": {
"match": {
"testid": "1MK_444-750"
}
}
}

Return documents with an array field that contains ALL elements from a user array in Elasticsearch 6.x

All my documents have a field, tags of type Array. I want to search and return all the documents that have an intersection of tags with a user-input array. The number of elements is variable, not a fixed size.
Examples:
tags:["python", "flask", "gunicorn"]
input:["python"]
This would return true because all the elements in input is in tags.
tags:["nginx", "pm2"]
input:["nodejs", "nginx", "pm2", "microservice"]
This would return false because "nodejs" and "microservice" is not in tags.
I looked into terms query but I do not think it works for arrays.
I also found this, Elasticsearch array property must contain given array items, but the solution is for old versions of Elasticsearch and the syntax has changed.
I believe you're looking for a terms_set - reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html
PUT tags
POST tags/_doc
{
"tags": ["python", "flask", "gunicorn"]
}
POST tags/_doc
{
"tags": ["nginx", "pm2"]
}
GET tags/_search
{
"query": {
"terms_set": {
"tags": {
"terms": ["nginx", "pm2"],
"minimum_should_match_script": {
"source": "params.num_terms"
}
}
}
}
}
Returned:
"hits" : {
"total" : 1,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "tags",
"_type" : "_doc",
"_id" : "XZqN_mkB94Kxh8PwtQs_",
"_score" : 0.5753642,
"_source" : {
"tags" : [
"nginx",
"pm2"
]
}
}
]
}
Querying the full list in your example:
GET tags/_search
{
"query": {
"terms_set": {
"tags": {
"terms": ["nodejs", "nginx", "pm2", "microservice"],
"minimum_should_match_script": {
"source": "params.num_terms"
}
}
}
}
}
Yields no results, as expected:
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

$in requires an array as a second argument, found: missing

can anybody please tell me what am i doing wrong?
db document structure:
{
"_id" : "module_settings",
"moduleChildren" : [
{
"_id" : "module_settings_general",
"name" : "General",
},
{
"_id" : "module_settings_users",
"name" : "Users",
},
{
"_id" : "module_settings_emails",
"name" : "Emails",
}
],
“permissions” : [
"module_settings_general",
"module_settings_emails"
]
}
pipeline stage:
{ $project: {
filteredChildren: {
$filter: {
input: "$moduleChildren",
as: "moduleChild",
cond: { $in : ["$$moduleChild._id", "$permissions"] }
}
},
}}
I need to filter "moduleChildren" array to show only modules which ids are in "permissions" array. Ive tried "$$ROOT.permissions" and "$$CURRENT.permissions" but none of them is working. I always get an error that $in is missing array as argument. It works when i hardcode the array like this: cond: { $in : ["$$moduleChild._id", [“module_settings_general", "module_settings_emails”]] } so it seems the problem is in passing of the array.
Thanks for any advices!
First option --> Use aggregation
Because your some of the documents in your collection may or may not contain permissions field or is type not equal to array that's why you are getting this error.
You can find the $type of the field and if it is not an array or not exists in your document than you can add it as an array with $addFields and $cond aggregation
db.collection.aggregate([
{ "$addFields": {
"permissions": {
"$cond": {
"if": {
"$ne": [ { "$type": "$permissions" }, "array" ]
},
"then": [],
"else": "$permissions"
}
}
}},
{ "$project": {
"filteredChildren": {
"$filter": {
"input": "$moduleChildren",
"as": "moduleChild",
"cond": {
"$in": [ "$$moduleChild._id", "$permissions" ]
}
}
}
}}
])
Second option -->
Go to your mongo shell or robomongo on any GUI you are using and run
this command
db.collection.update(
{ "permissions": { "$ne": { "$type": "array" } } },
{ "$set": { "permissions": [] } },
{ "multi": true }
)

Checking if field exists under an elasticsearch nested aggregation

Trying to perform an ES query, I ran into a problem while trying to do a nested filtering of objects in an array. Our structure of data has changed from being:
"_index": "events_2015-07-08",
"_type": "my_type",
"_source":{
...
...
"custom_data":{
"className:"....."
}
}
to:
"_index": "events_2015-07-08",
"_type": "my_type",
"_source":{
...
...
"custom_data":[ //THIS CHANGED FROM AN OBJECT TO AN ARRAY OF OBJECTS
{
"key":".....",
"val":"....."
},
{
"key":".....",
"val":"....."
}
]
}
this nested filter works fine on indices that have the new data structure:
{
"nested": {
"path": "custom_data",
"filter": {
"bool": {
"must": [
{
"term":
{
"custom_data.key": "className"
}
},
{
"term": {
"custom_data.val": "SOME_VALUE"
}
}
]
}
},
"_cache": true
}
}
However, it fails when going over indices that have the older data structure, so that feature cannot be added. Ideally I'd be able to find both data structures but at this point i'd settle for a "graceful failure" i.e. just don't return results where the structure is old.
I have tried adding an "exists" filter on the field "custom_data.key", and an "exists" within "not" on the field "custom_data.className", but I keep getting "SearchParseException[[events_2015-07-01][0]: from[-1],size[-1]: Parse Failure [Failed to parse source"
There is an indices filter (and query) that you can use to perform conditional filters (and queries) based on the index that it is running against.
{
"query" : {
"filtered" : {
"filter" : {
"indices" : {
"indices" : ["old-index-1", "old-index-2"],
"filter" : {
"term" : {
"className" : "SOME_VALUE"
}
},
"no_match_filter" : {
"nested" : { ... }
}
}
}
}
}
}
Using this, you should be able to transition off of the old mapping and onto the new mapping.

Resources