I am having an index on elasticsearch having an array in its record.
Say the field name is "samples" and the array is :
["abc","xyz","mnp".....]
So is there any query so that I could specify the no of elements to retrieve from the array .
Say I want that the retrieved record should only have first 2 elements in sample array
Assuming you have array of strings as a document. I have a couple of ideas in my mind which might help you.
PUT /arrayindex/
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"spacelyzer": {
"tokenizer": "whitespace"
},
"commalyzer": {
"type": "custom",
"tokenizer": "commatokenizer",
"char_filter": "square_bracket"
}
},
"tokenizer": {
"commatokenizer": {
"type": "pattern",
"pattern": ","
}
},
"char_filter": {
"square_bracket": {
"type": "mapping",
"mappings": [
"[=>",
"]=>"
]
}
}
}
}
},
"mappings": {
"array_set": {
"properties": {
"array_space": {
"analyzer": "spacelyzer",
"type": "string"
},
"array_comma": {
"analyzer": "commalyzer",
"type": "string"
}
}
}
}
}
POST /arrayindex/array_set/1
{
"array_space": "qwer qweee trrww ooenriwu njj"
}
POST /arrayindex/array_set/2
{
"array_comma": "[qwer,qweee,trrww,ooenriwu,njj]"
}
The above DSL accepts two types of arrays one is a white-space separated string where every string represents an element of array and the other is a type of array that was specified by you. This is array is possible in Python and in python if you index such a document it is automatically converted to string i.e. ["abc","xyz","mnp".....] would be converted to "["abc","xyz","mnp".....]".
spacelyzer tokenizes according to the whitespaces and commalyzer tokenizes according to the commas and removes [ and ] from the string.
Now if you'll the Termvector API like this:
GET arrayindex/array_set/1/_termvector
{
"fields" : ["array_space", "array_comma"],
"term_statistics" : true,
"field_statistics" : true
}
GET arrayindex/array_set/2/_termvector
{
"fields" : ["array_space", "array_comma"],
"term_statistics" : true,
"field_statistics" : true
}
You can simply get the position of the element from their responses e.g. to find the position of "njj" use
termvector_response["term_vectors"]["array_comma"]["terms"]["njj"]["tokens"][0]["position"] or,
termvector_response["term_vectors"]["array_space"]["terms"]["njj"]["tokens"][0]["position"]
Both will give you 4 which is the actual index in the array specified. I suggest you to the whitespace type design.
The Python code for this can be:
from elasticsearch import Elasticsearch
ES_HOST = {"host" : "localhost", "port" : 9200}
ES_CLIENT = Elasticsearch(hosts = [ES_HOST], timeout = 180)
def getTermVector(doc_id):
a = ES_CLIENT.termvector\
(index = "arrayindex",
doc_type = "array_set",
id = doc_id,
field_statistics = True,
fields = ['array_space', 'array_comma'],
term_statistics = True)
return a
def getElements(num, array_no):
all_terms = getTermVector(array_no)['term_vectors']['array_space']['terms']
for i in range(num):
for term in all_terms:
for jsons in all_terms[term]['tokens']:
if jsons['position'] == i:
print term, "# index", i
getElements(3, 1)
# qwer # index 0
# qweee # index 1
# trrww # index 2
Related
i had some id value (numeric and text combination) in my elasticsearch index, and in my program user might will input some special characters in search keyword.
and i want to know is there anyway that can let elasticsearch to use exact search and also can remove some special characters in search keywork
i already use custom analyzer to split search keyword by some special characters. and use query->match to search data, and i still got no results
data
{
"_index": "testdata",
"_type": "_doc",
"_id": "11112222",
"_source": {
"testid": "1MK444750"
}
}
custom analyzer
"analysis" : {
"analyzer" : {
"testidanalyzer" : {
"pattern" : """([^\w\d]+|_)""",
"type" : "pattern"
}
}
}
mapping
{
"article" : {
"mappings" : {
"_doc" : {
"properties" : {
"testid" : {
"type" : "text",
"analyzer" : "testidanalyzer"
}
}
}
}
}
}
here's my elasticsearch query
GET /testdata/_search
{
"query": {
"match": {
// "testid": "1MK_444-750" // no result
"testid": "1MK444750"
}
}
}
and analyzer successfully seprated separated my keyword, but i just can't match anything in result
POST /testdata/_analyze
{
"analyzer": "testidanalyzer",
"text": "1MK_444-750"
}
{
"tokens" : [
{
"token" : "1mk",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "444",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 1
},
{
"token" : "750",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 2
}
]
}
please help, thanks in advance!
First off, you should probably model the testid field as keyword rather than text, it's a more appropriate data type.
You want to put in a feature whereby some characters (_, -) are effectively ignored at search time. You can achieve this by giving your field a normalizer, which tells Elasticsearch how to preprocess data for this field prior to indexing or searching. Specifically, you can declare a mapping char filter in your normalizer that replaces these characters with an empty string.
This is how all these changes would fit into your mapping:
PUT /testdata
{
"settings": {
"analysis": {
"char_filter": {
"mycharfilter": {
"type": "mapping",
"mappings": [
"_ => ",
"- => "
]
}
},
"normalizer": {
"mynormalizer": {
"type": "custom",
"char_filter": [
"mycharfilter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"testid" : {
"type" : "keyword",
"normalizer" : "mynormalizer"
}
}
}
}
}
The following searches would then produce the same results:
GET /testdata/_search
{
"query": {
"match": {
"testid": "1MK444750"
}
}
}
GET /testdata/_search
{
"query": {
"match": {
"testid": "1MK_444-750"
}
}
}
I am working on an express js application where I need to update a nested array.
1) Schema :
//Creating a mongoose schema
var userSchema = mongoose.Schema({
_id: {type: String, required:true},
name: String,
sensors: [{
sensor_name: {type: String, required:true},
measurements: [{time: String}]
}] });
2)
Here is the code snippet and explanation is below:
router.route('/sensors_update/:_id/:sensor_name/')
.post(function (req, res) {
User.findOneAndUpdate({_id:req.body._id}, {$push: {"sensors" :
{"sensor_name" : req.body.sensor_name , "measurements.0.time": req.body.time } } },
{new:true},function(err, newSensor) {
if (err)
res.send(err);
res.send(newSensor)
}); });
I am able to successfully update a value to the measurements array using the findOneAndUpdate with push technique but I'm failing when I try to add multiple measurements to the sensors array.
Here is current json I get if I get when I post a second measurement to the sensors array :
{
"_id": "Manasa",
"name": "Manasa Sub",
"__v": 0,
"sensors": [
{
"sensor_name": "ras",
"_id": "57da0a4bf3884d1fb2234c74",
"measurements": [
{
"time": "8:00"
}
]
},
{
"sensor_name": "ras",
"_id": "57da0a68f3884d1fb2234c75",
"measurements": [
{
"time": "9:00"
}
]
}]}
But the right format I want is posting multiple measurements with the sensors array like this :
Right JSON format would be :
{
"_id" : "Manasa",
"name" : "Manasa Sub",
"sensors" : [
{
"sensor_name" : "ras",
"_id" : ObjectId("57da0a4bf3884d1fb2234c74"),
"measurements" : [
{
"time" : "8:00"
}
],
"measurements" : [
{
"time" : "9:00"
}
]
}],
"__v" : 0 }
Please suggest some ideas regarding this. Thanks in advance.
You might want to rethink your data model. As it is currently, you cannot accomplish what you want. The sensors field refers to an array. In the ideal document format that you have provided, you have a single object inside that array. Then inside that object, you have two fields with the exact same key. In a JSON object, or mongo document in this context, you can't have duplicate keys within the same object.
It's not clear exactly what you're looking for here, but perhaps it would be best to go for something like this:
{
"_id" : "Manasa",
"name" : "Manasa Sub",
"sensors" : [
{
"sensor_name" : "ras",
"_id" : ObjectId("57da0a4bf3884d1fb2234c74"),
"measurements" : [
{
"time" : "8:00"
},
{
"time" : "9:00"
}
]
},
{
// next sensor in the sensors array with similar format
"_id": "",
"name": "",
"measurements": []
}],
}
If this is what you want, then you can try this:
User.findOneAndUpdate(
{ _id:req.body._id "sensors.sensor_name": req.body.sensor_name },
{ $push: { "sensors.0.measurements": { "time": req.body.time } } }
);
And as a side note, if you're only ever going to store a single string in each object in the measurements array, you might want to just store the actual values instead of the whole object { time: "value" }. You might find the data easier to handle this way.
Instead of hardcoding the index of the array it is possible to use identifier and positional operator $.
Example:
User.findOneAndUpdate(
{ _id: "Manasa" },
{ $push: { "sensors.$[outer].measurements": { "time": req.body.time } } }
{ "arrayFilters:" [{"outer._id": ObjectId("57da0a4bf3884d1fb2234c74")}]
);
You may notice than instead of getting a first element of the array I specified which element of the sensors array I would like to update by providing its ObjectId.
Note that arrayFilters are passed as the third argument to the update query as an option.
You could now make "outer._id" dynamic by passing the ObjectId of the sensor like so: {"outer._id": req.body.sensorId}
In general, with the use of identifier, you can get to even deeper nested array elements by following the same procedure and adding more filters.
If there was a third level nesting you could then do something like:
User.findOneAndUpdate(
{ _id: "Manasa" },
{ $push: { "sensors.$[outer].measurements.$[inner].example": { "time": req.body.time } } }
{ "arrayFilters:" [{"outer._id": ObjectId("57da0a4bf3884d1fb2234c74"), {"inner._id": ObjectId("57da0a4bf3884d1fb2234c74"}}]
);
You can find more details here in the answer written by Neil Lunn.
refer ::: positional-all
--- conditions :: { other_conditions, 'array1.array2.field_to_be_checked': 'value' }
--- updateData ::: { $push : { 'array1.$[].array2.$[].array3' : 'value_to_be_pushed' } }
Is there a way to match all values in a document array? for eg. if my search array is ["1","2","3","4","5"] and my documents have fields like
doc1: "arr":["1","3","5"]
doc2: "arr":["1","2","7","9"]
doc3: "arr":["1","8"]
Then only the first document should be a match because all the values in the document are present in the search array. I tried using the script filter (to get the length of the array) and tried using the minimum_should_match parameter but I cant get it to work. How do I use a variable created by a script as a parameter for minimum_should_match?
Can't directly search array to check whether contains. Because the analyzer will analysis the search key and match it, if there is any matched key, it will return results.
If want to match array whether contains the specified array, need to split the searched array to multiple terms, like:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"term": {
"number": 1
}
}, {
"term": {
"number": 2
}
}, {
"term": {
"number": 7
}
}, {
"term": {
"number": 9
}
}]
}
}
}
}
}
Is it possible to return records based on the size of the array after $elemMatch has filtered it down?
For example, if I have many records in a collection like the following:
[
{
contents: [
{
name: "yorkie",
},
{
name: "dairy milk",
},
{
name: "yorkie",
},
]
},
// ...
]
And I wanted to find all records in which their contents field contained 2 array items with their name field equal to "yorkie", how would I do this? To clarify, the array could contain other items, but the criteria is met so long as 2 of those array items have the matching field:value.
I'm aware I can use $elemMatch (or contents.name) to return records where the array contains at least one item matching that name, and I'm aware I can also use $size to filter based on the exact number of array items in the record's field. Is there a way that they can be both combined?
Not in a find query, but it can be done with an aggregation:
db.test.aggregate([
{ "$match" : { "contents.name" : "yorkie" } },
{ "$unwind" : "$contents" },
{ "$match" : { "contents.name" : "yorkie" } },
{ "$group" : { "_id" : "$_id", "sz" : { "$sum" : 1 } } }, // use $first to include other fields
{ "$match" : { "sz" : { "$gte" : 2 } } }
])
I interpreted
the criteria is met so long as 2 of those array items have the matching field:value
as meaning the criteria is met if at least 2 array items have the matching value in name.
I know this thread is old, but today you can just use find
db.test.find({
"$expr": {
"$gt": [
{
"$reduce": {
"input": "$contents",
"initialValue": 0,
"in": {
"$cond": {
"if": {
"$eq": ["$$this.name", 'yorkie']
},
"then": {
"$add": ["$$value", 1]
},
"else": "$$value"
}
}
}
},
1
]
}
})
The reduce will do the trick here, and will return the number of objects that match the criteria
I have a document structure like
{
"_id" : ObjectId("52263922f5ebf05115bf550e"),
"Fields" : [
{
"Field" : "Lot No",
"Rules" : [ ]
},
{
"Field" : "RMA No",
"Rules" : [ ]
}
]
}
I have tried to update by using the following code to push into the Rules Array which will hold objects.
db.test.update({
"Fields.Field":{$in:["Lot No"]}
}, {
$addToSet: {
"Fields.Field.$.Rules": {
"item_name": "my_item_two",
"price": 1
}
}
}, false, true);
But I get the following error:
can't append to array using string field name [Field]
How do I do the update?
You gone too deep with that wildcard $. You match for an item in the Fields array, so you get a access on that, with: Fields.$. This expression returns the first match in your Fields array, so you reach its fields by Fields.$.Field or Fields.$.Result.
Now, lets update the update:
db.test.update({
"Fields.Field": "Lot No"
}, {
$addToSet: {
"Fields.$.Rules": {
'item_name': "my_item_two",
'price':1
}
}
}, false, true);
Please note that I've shortened the query as it is equal to your expression.