Logstash split xml into array - arrays

Is it possible to convert xml into array of objects using logstash?
That'd be my sample document:
{
"Title" : "My blog title",
"Body" : "My first post ever",
"Metadata" : "<root><Tags><TagTypeID>1</TagTypeID><TagValue>twitter</TagValue></Tags><Tags><TagTypeID>1</TagTypeID><TagValue>facebook</TagValue></Tags><Tags><TagTypeID>2</TagTypeID><TagValue>usa</TagValue></Tags><Tags><TagTypeID>3</TagTypeID><TagValue>smartphones</TagValue></Tags></root>"
}
Ideally, I'd like to output this:
{
"Title" : "My blog title",
"Body" : "My first post ever",
"Metadata" : [
{
"TagTypeID" : "1",
"TagValue" : "twitter"
},
{
"TagTypeID" : "1",
"TagValue" : "facebook"
},
{
"TagTypeID" : "2",
"TagValue" : "usa"
},
{
"TagTypeID" : "3",
"TagValue" : "smartphones"
}
]
}
However I'm not able to achieve that. I tried using xml filter like that:
xml
{
source => "Metadata"
target => "Parsed"
}
However, it outputs this
{
"Title" : "My blog title",
"Body" : "My first post ever",
"#version" : "1",
"#timestamp" : "2015-10-27T17:21:31.961Z",
"Parsed" : {
"Tags" : [
{
"TagTypeID" : ["1"],
"TagValue" : ["twitter"]
},
{
"TagTypeID" : ["1"],
"TagValue" : ["facebook"]
},
{
"TagTypeID" : ["2"],
"TagValue" : ["usa"]
},
{
"TagTypeID" : ["3"],
"TagValue" : ["smartphones"]
}
]
}
}
I don't want my values to be stored as arrays (I know there's always going to be just one value there).
I know what fields are going to be brought back from my input, so I can map structure myself and this doesn't need to be dynamic (although that would be nice).
Allow splitting of lists / arrays into multiple events seemed to be useful, but it's poorly documented and I couldn't find information how to use this filter for my use-case.
Logstash, split event from an xml file in multiples documents keeping information from root tags is similar, but not exactly what I'd like to achieve.
Logstash: XML to JSON output from array to string this seems to be useful, however it hardcodes that first element of array must be outputed as single item (not part of array). It brings me back this:
{
"Title" : "My blog title",
"Body" : "My first post ever",
"#version" : "1",
"#timestamp" : "2015-10-27T17:21:31.961Z",
"Parsed" : {
"Tags" : [
{
"TagTypeID" : "1",
"TagValue" : "twitter"
},
{
"TagTypeID" : ["1"],
"TagValue" : ["facebook"]
},
{
"TagTypeID" : ["2"],
"TagValue" : ["usa"]
},
{
"TagTypeID" : ["3"],
"TagValue" : ["smartphones"]
}
]
}
}
Can this be done without having to create custom filters? (I've no
experience in Ruby)
Or am I missing something basic here?

Here is one approach using logstash's builtin ruby filter.
Filter section:
filter {
xml {
source => "Metadata"
target => "Parsed"
}
ruby { code => "
event['Parsed']['Tags'].each do |x|
x.each do |key, value|
x[key] = value[0]
end
end"
}
}
Output:
"Parsed":{
"Tags":[
{
"TagTypeID":"1",
"TagValue":"twitter"
},
{
"TagTypeID":"1",
"TagValue":"facebook"
},
{
"TagTypeID":"2",
"TagValue":"usa"
},
{
"TagTypeID":"3",
"TagValue":"smartphones"
}
]
}
If I understand you correctly this is your desired result. You need to specify the xml field inside the ruby filter: event['Parsed']['Tags']. Does it need to be more dynamic? Let me know if you need anything else.
Can this be done without having to create custom filters? (I've no experience in Ruby)
Well, yes and no. Yes, because this is not really a custom filter but a built-in solution. No, because I tend to say this can not be done without Ruby. I must admit that Ruby seems to be an unattractive solution. However, this is a flexible approach and 5 lines of code shouldn't hurt that much.

Most recent Logstash version (5.1.1 at this point) has updated XML filter, which has force_array option. It is enabled by default. Setting this to false will do exactly the same thing as ruby filter in accepted answer.
Taken from documentation:
force_contentedit
Value type is boolean
Default value is false
By default the filter will expand attributes differently from content inside of tags. This option allows you to force text content and attributes to always parse to a hash value.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html#plugins-filters-xml-force_array

Related

How to add a field into an Elastic Search document with an array of objects value?

I am using Elastic Search 5.6 and I'd like to add a new field to an existing document. This field's value should be of type array. I was able to add a new field with an array of strings by sending a POST request with the following body
{"script" : "ctx._source.new_field = ['string1', 'string2']"}
to my endpoint
http://localhost:9200/my_index/my_mapping/item_173969/_update
and verified that the new field is there, but I need to set the value of this key to an array of objects instead. How can I do this? Specifically, I need my document to have:
"size": [
{
"item_size": "XL",
"country_size": "US"
}
]
You are almost there. I'm using ES 7.1 that ignores document type but the script part should work.
Ingesting data
PUT test_david/_doc/1
{
"name": "doc1"
}
Updating document
POST test_david/_update/1
{
"script": {
"source": "if (!ctx._source.containsKey('sizes')) ctx._source.sizes= new ArrayList(); ctx._source.sizes.add(params.data);",
"params": {
"data": {
"item_size": "XL",
"country_size": "US"
}
}
}
}
Note this will validate the document has the property sizes already created and add a new size to it.
Query
POST test_david/_search
Response
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_david",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "doc1",
"sizes" : [
{
"country_size" : "US",
"item_size" : "XL"
}
]
}
}
]
}
}
About Nested Type
You use nested type only if you want to preserve the relationship between properties of the same children to make queries.
Object field type let to do searches like:
"item_size and/or item_country on any of the size objects"
Nested field type searches like:
"item_size and/or item_country on the same size object"
Not all nested fields needs to be of nested type. Only if you need to keep the relation between properties of the same children to do queries because nested type is more expensive and if you are only storing and displaying data it makes no sense to use this type of field.

Parsing JSON array in Swift 3

I'm trying to parse JSON file to show in table view in Swift 3 without using external frameworks, but my knowledge in Swift is not enough. I have looked other questions, but nothing worked for me. The problem is that in my JSON there are couple of nested arrays.
Here is example of the JSON file:
"guides" : [
{
"name" : "First Guide",
"sections" : [
{
"title" : "Controls",
"data" : [
{
"value" : "controls.PNG",
"type" : "image"
},
{
"value" : "Sensitivity: Changes the sensitivity of the camera when turning.",
"type" : "text"
},
{
"value" : "Invert Y-Axis: Toggles inversion of camera when looking up/down.",
"type" : "text"
},
{
"value" : "crosshair.PNG",
"type" : "image"
},
{
"value" : "Lefty : Toggles the D-pad being on the left/right side of the screen.",
"type" : "text"
},
{
"value" : "Swap Jump and Sneak : Chooses whether to swap the position of jump and sneak buttons.",
"type" : "text"
},
{
"value" : "Button size - Changes the size of the buttons. Smaller buttons allow extra slots for the hotbar.",
"type" : "text"
}
]
},
{
"title" : "User Profile",
"data" : [
{
"value" : "profile.png",
"type" : "image"
},
{
"value" : "Use Cellular Data: Gives the Player the option to use cellular data.",
"type" : "text"
}
]
},
{
"title" : "Global Resources",
"data" : [
{
"value" : "resources.png",
"type" : "image"
},
..............
How I get parse the data into Swift arrays and use it to be displayed in UITableView controller. There are couple of "guides" in this JSON and I need to be able to show only one of them at the time.
Help will be much appreciated. Thank you in advance.
Try using following code:
let jsonString = "[\"a\",\"b\"]"
let jsonData = jsonString.data(using: .utf8)! as Data
do {
let dataJson = try JSONSerialization.jsonObject(with: jsonData, options: .mutableContainers)
print(dataJson)
}
catch {
print("error getting xml string: \(error)")
}

Updating Nested Array Mongoose

I am working on an express js application where I need to update a nested array.
1) Schema :
//Creating a mongoose schema
var userSchema = mongoose.Schema({
_id: {type: String, required:true},
name: String,
sensors: [{
sensor_name: {type: String, required:true},
measurements: [{time: String}]
}] });
2)
Here is the code snippet and explanation is below:
router.route('/sensors_update/:_id/:sensor_name/')
.post(function (req, res) {
User.findOneAndUpdate({_id:req.body._id}, {$push: {"sensors" :
{"sensor_name" : req.body.sensor_name , "measurements.0.time": req.body.time } } },
{new:true},function(err, newSensor) {
if (err)
res.send(err);
res.send(newSensor)
}); });
I am able to successfully update a value to the measurements array using the findOneAndUpdate with push technique but I'm failing when I try to add multiple measurements to the sensors array.
Here is current json I get if I get when I post a second measurement to the sensors array :
{
"_id": "Manasa",
"name": "Manasa Sub",
"__v": 0,
"sensors": [
{
"sensor_name": "ras",
"_id": "57da0a4bf3884d1fb2234c74",
"measurements": [
{
"time": "8:00"
}
]
},
{
"sensor_name": "ras",
"_id": "57da0a68f3884d1fb2234c75",
"measurements": [
{
"time": "9:00"
}
]
}]}
But the right format I want is posting multiple measurements with the sensors array like this :
Right JSON format would be :
{
"_id" : "Manasa",
"name" : "Manasa Sub",
"sensors" : [
{
"sensor_name" : "ras",
"_id" : ObjectId("57da0a4bf3884d1fb2234c74"),
"measurements" : [
{
"time" : "8:00"
}
],
"measurements" : [
{
"time" : "9:00"
}
]
}],
"__v" : 0 }
Please suggest some ideas regarding this. Thanks in advance.
You might want to rethink your data model. As it is currently, you cannot accomplish what you want. The sensors field refers to an array. In the ideal document format that you have provided, you have a single object inside that array. Then inside that object, you have two fields with the exact same key. In a JSON object, or mongo document in this context, you can't have duplicate keys within the same object.
It's not clear exactly what you're looking for here, but perhaps it would be best to go for something like this:
{
"_id" : "Manasa",
"name" : "Manasa Sub",
"sensors" : [
{
"sensor_name" : "ras",
"_id" : ObjectId("57da0a4bf3884d1fb2234c74"),
"measurements" : [
{
"time" : "8:00"
},
{
"time" : "9:00"
}
]
},
{
// next sensor in the sensors array with similar format
"_id": "",
"name": "",
"measurements": []
}],
}
If this is what you want, then you can try this:
User.findOneAndUpdate(
{ _id:req.body._id "sensors.sensor_name": req.body.sensor_name },
{ $push: { "sensors.0.measurements": { "time": req.body.time } } }
);
And as a side note, if you're only ever going to store a single string in each object in the measurements array, you might want to just store the actual values instead of the whole object { time: "value" }. You might find the data easier to handle this way.
Instead of hardcoding the index of the array it is possible to use identifier and positional operator $.
Example:
User.findOneAndUpdate(
{ _id: "Manasa" },
{ $push: { "sensors.$[outer].measurements": { "time": req.body.time } } }
{ "arrayFilters:" [{"outer._id": ObjectId("57da0a4bf3884d1fb2234c74")}]
);
You may notice than instead of getting a first element of the array I specified which element of the sensors array I would like to update by providing its ObjectId.
Note that arrayFilters are passed as the third argument to the update query as an option.
You could now make "outer._id" dynamic by passing the ObjectId of the sensor like so: {"outer._id": req.body.sensorId}
In general, with the use of identifier, you can get to even deeper nested array elements by following the same procedure and adding more filters.
If there was a third level nesting you could then do something like:
User.findOneAndUpdate(
{ _id: "Manasa" },
{ $push: { "sensors.$[outer].measurements.$[inner].example": { "time": req.body.time } } }
{ "arrayFilters:" [{"outer._id": ObjectId("57da0a4bf3884d1fb2234c74"), {"inner._id": ObjectId("57da0a4bf3884d1fb2234c74"}}]
);
You can find more details here in the answer written by Neil Lunn.
refer ::: positional-all
--- conditions :: { other_conditions, 'array1.array2.field_to_be_checked': 'value' }
--- updateData ::: { $push : { 'array1.$[].array2.$[].array3' : 'value_to_be_pushed' } }

Create a document with an array from filtered elements in an existing document array

I've asked this question before but not in the clearest way since I had no responses :( so I thought I would try again.
I have a document as shown below, I want to create a new document which only picks the names in the array where language = "English".
{
"_id" : ObjectId("564d35d5150699558156942b"),
"objectCategory" : "Food",
"objectType" : "Fruit",
"objectName" : [
{
"language" : "English",
"name" : "Apple"
},
{
"language" : "French",
"name" : "Pomme"
},
{
"language" : "English",
"name" : "Strawberry"
},
{
"language" : "French",
"name" : "Fraise"
}
]
}
I want the $out document to look like this below. I know I can filter a document by content but, I want to filter within a single document not across a collection. For getting the right document in the first place, I would have a query to $find objectCategory = "Food" and objectType = "Fruit"
{
"_id" : ObjectId("564d35d5150699558156942b"),
"objectCategory" : "Food",
"objectType" : "Fruit",
"objectName" : [
"name" : "Apple",
"name" : "Strawberry"
]
}
Thanks, Matt
wow, ah, I really thought I found it with:
db.serviceCatalogue.find({objectName: {"$elemMatch": {language: "English"}}}, {"objectName.name": 1})
;thanks to: Retrieve only the queried element in an object array in MongoDB collection
However, it did nothing, I must have dreamt it worked. How do you just get the array positions where the value of a field called language = 'English'?
this is only an example of what I want to do, it seems like this is just painful, especially with no-one answering other than me :)

How to count occurence of each value in array?

I have a database of ISSUES in MongoDB, some of the issues have comments, which is an array; each comments has a writer. How can I count the number of comments each writer has written?
I've tried
db.test.issues.group(
{
key = "comments.username":true;
initial: {sum:0},
reduce: function(doc, prev) {prev.sum +=1},
}
);
but no luck :(
A Sample:
{
"_id" : ObjectId("50f48c179b04562c3ce2ce73"),
"project" : "Ruby Driver",
"key" : "RUBY-505",
"title" : "GETMORE is sent to wrong server if an intervening query unpins the connection",
"description" : "I've opened a pull request with a failing test case demonstrating the bug here: https://github.com/mongodb/mongo-ruby-driver/pull/134\nExcerpting that commit message, the issue is: If we do a secondary read that is large enough to require sending a GETMORE, and then do another query before the GETMORE, the secondary connection gets unpinned, and the GETMORE gets sent to the wrong server, resulting in CURSOR_NOT_FOUND, even though the cursor still exis ts on the server that was initially queried.",
"status" : "Open",
"components" : [
"Replica Set"
],
"affected_versions" : [
"1.7.0"
],
"type" : "Bug",
"reporter" : "Nelson Elhage",
"priority" : "major",
"assignee" : "Tyler Brock",
"resolution" : "Unresolved",
"reported_on" : ISODate("2012-11-17T20:30:00Z"),
"votes" : 3,
"comments" : [
{
"username" : "Nelson Elhage",
"date" : ISODate("2012-11-17T20:30:00Z"),
"body" : "Thinking some more"
},
{
"username" : "Brandon Black",
"date" : ISODate("2012-11-18T20:30:00Z"),
"body" : "Adding some findings of mine to this ticket."
},
{
"username" : "Nelson Elhage",
"date" : ISODate("2012-11-18T20:30:00Z"),
"body" : "I think I tracked down the 1.9 dependency."
},
{
"username" : "Nelson Elhage",
"date" : ISODate("2012-11-18T20:30:00Z"),
"body" : "Forgot to include a link"
}
]
}
You forgot the curly braces on the key value and you need to terminate that line with a , instead of a ;.
db.issues.group({
key: {"comments.username":true},
initial: {sum:0},
reduce: function(doc, prev) {prev.sum +=1},
});
UPDATE
After realizing comments is an array...you'd need to use aggregate for that so that you can 'unwind' comments and then group on it:
db.issues.aggregate(
{$unwind: '$comments'},
{$group: {_id: '$comments.username', sum: {$sum: 1}}}
);
For the sample doc in the question, this outputs:
{
"result": [
{
"_id": "Brandon Black",
"sum": 1
},
{
"_id": "Nelson Elhage",
"sum": 3
}
],
"ok": 1
}
Just a snide answer here to compliment #JohnnyHKs answer: it sounds like your new to MongoDB and as such possibly working on a new version of MongoDB if that is the case (if not I would upgrade) either way the old group count is kinda bad. It won't, for one, work with sharding.
Instead in MongoDB 2.2 you can just do:
db.col.aggregate({$group: {_id: "$comments.username", count: {$sum: 1}}})
Or something similar. You can read more about it here: http://docs.mongodb.org/manual/applications/aggregation/

Resources