Convert JSON Lines to JSON array using Apache Nifi

Convert JSON Lines to JSON array using Apache Nifi - arrays

I have a file that includes (schemaless) JSON Lines encoded data.
For example:
{"foo" : "abc", "bar" : "def" }
{"foo" : "xyz" }
{"foo" : "ghi", "bar" : "jkl", "name" : "The Dude"}
I would like to use NIFI to convert this into a JSON array:
[{"foo" : "abc", "bar" : "def" },{"foo" : "xyz" },{"foo" : "ghi", "bar" : "jkl", "name" : "The Dude"}]

The easiest way to accomplish this in Apache NiFi is to use two ReplaceText processors. In the first, configure as:
Search Value: \}\s*\{
Replacement Value: \},\{
Replacement Strategy: Regex Replace
Evaluation Mode: Entire Text
This will remove the line breaks between the tuples and insert commas between them. In the second:
Search Value: (^.*$)
Replacement Value: [$1]
Replacement Strategy: Regex Replace
Evaluation Mode: Entire Text
This will add the enclosing brackets around the JSON array. There are other ways to accomplish this with ExecuteScript or JoltTransformJSON processors, but they are more complicated and brittle.

Related

MongoDB: updating an array in array

I seem to be having an issue accessing the contents of an array nested within an array in a mongodb document. I have no problems accessing the first array "groups" with a query like the following...
db.orgs.update({_id: org_id, "groups._id": group_id} , {$set: {"groups.$.name": "new_name"}});
Where I run into trouble is when I try to modify properties of an element in the array "features" nested within the "group" array.
Here is what an example document looks like
{
"_id" : "v5y8nggzpja5Pa7YS",
"name" : "Example",
"display_name" : "EX1",
"groups" : [
{
"_id" : "s86CbNBdqJnQ5NWaB",
"name" : "Group1",
"display_name" : "G1",
"features" : [
{
_id : "bNQ5Bs8BWqJn6CdNa"
type : "blog",
name : "[blog name]"
owner_id : "ga5YgvP5yza7pj8nS"
},
]
},
]
},
And this is the query I tried to use.
db.orgs.update({_id: "v5y8nggzpja5Pa7YS", "groups._id": "qBX3KDrtMeJGvZWXZ", "groups.features._id":"bNQ5Bs8BWqJn6CdNa" }, {$set: {"groups.$.features.$.name":"New Blog Name"}});
It returns with an error message:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 2,
"errmsg" : "Too many positional (i.e. '$') elements found in path 'groups.$.features.$.name'"
}
})
It seems that mongo doesn't support modifying arrays nested within arrays via the positional element?
Is there a way to modify this array without taking the entire thing out, modifying it, and then putting it back in? With multiple nesting like this is it standard practice to create a new collection? (Even though the data is only ever needed when the parent data is necessary) Should I change the document structure so that the second nested array is an object, and access it via key? (Where the key is an integer value that can act as an "_id")
groups.$.features.[KEY].name
What is considered the "correct" way to do this?

After some more research, it looks like the only way to modify the array within an array would be with some outside logic to find the index of the element I want to change. Doing this would require every change to have a find query to locate the index, and then an update query to modify the array. This doesn't seem like the best way.
Link to a 2010 JIRA case requesting multiple positional elements...
Since I will always know the ID of the feature, I have opted to revise my document structure.
{
"_id" : "v5y8nggzpja5Pa7YS",
"name" : "Example",
"display_name" : "EX1",
"groups" : [
{
"_id" : "s86CbNBdqJnQ5NWaB",
"name" : "Group1",
"display_name" : "G1",
"features" : {
"1" : {
type : "blog",
name : "[blog name]"
owner_id : "ga5YgvP5yza7pj8nS"
},
}
},
]
},
With the new structure, changes can be made in the following manner:
db.orgs.update({_id: "v5y8nggzpja5Pa7YS", "groups._id": "s86CbNBdqJnQ5NWaB"}, {$set: {"groups.$.features.1.name":"Blog Test 1"}});

mongoexport csv output last array values

Inspired by this question in Server Fault
https://serverfault.com/questions/459042/mongoexport-csv-output-array-values
I'm using mongoexport to export some collections into CSV files, however when I try to target fields which are the last members of an array I cannot get it to export correctly.
Command I'm using
mongoexport -d db -c collection -fieldFile fields.txt --csv > out.csv
One item of my collection:
{
"id": 1,
"name": "example",
"date": [
{"date": ""},
{"date": ""},
],
"status": [
"true",
"false",
],
}
I can access to the first member of my array writing the fields like the following
name
id
date.0.date
status.0
Is there a way to acess the last item of my array without knowing the lenght of the array?
Because the following doesn't work:
name
id
date.-1.date
status.-1
Any idea of the correct notation? Or if it's simply not possible?

It's not possible to reference the last element of the array without knowing the length of the array, since the notation is array_field.index where the index is in [0, length - 1]. You could use the aggregation framework to create the view of the data that you want to export, save it temporarily into a collection with $out, and then mongoexport that. For example, for your documents you could do
db.collection.aggregate([
{ "$unwind" : "$date" },
{ "$group" : { "_id" : "$_id", "date" : { "$last" : "$date" } } },
{ "$out" : "temp-for-csv" }
])
in order to get just the last date for each document and output it to the collection temp-for-csv.
You can return just the last elements in an array with the $slice projection operator, but this isn't available in aggregation and mongoexport only takes a query specification, not a projection specification, since the --fields and --fieldFile option are supposed to suffice. Might be a good feature request to ask for using a query with a projection for mongoexport.

AngularJS extract dynamic matching data from JSON

I have a dynamic complex JSON, something like this
var source = [{
"ab" : 123,
"xfg" : {
"cdf" : "xyz",
"e" : [{"aaa" : "bbb"}, {"ccc" : "ccc"}]
},
"mno" : ["fff", "123"]
}];
How can I extract data from this JSON using some dynamic expressions in a given search object:
var search= {
"search1" : "ab",
"search2" : "xfg.cdf",
"search3" : "ccc value in xfg.e?",
}
Basically, I can analyze the type of each element in the search object, if it's a string split it by '.' separator and then access the elements in the source object...
But what about complex search expressions? How do I get the 'ccc' value for example?. Is there a way to implement complex search expressions? something like in mongodb find function?
Thanks

I haven't used this, but the demo looks really nice. It's basically a css type selector for JSON: JSONSelect.
For a more XPath style, try JSONPath, which also looks very capable. JSONPath
Both are Javascript libraries and are easily included in your project.

MongoDB - Pull from array of objects

I have a collection
{
"_id" : ObjectId("534bae30bf5049a522e502fe"),
"data" : [
{
"0" : {
"content" : "1",
"type" : "text",
"ident" : true
},
"1" : {
"content" : "",
"type" : "text",
"ident" : false
}
},
{
"0" : {
"content" : "2",
"type" : "text",
"ident" : true
},
"1" : {
"content" : "",
"type" : "text"
}
}
]
}
content is unique.
How would i remove the object that matches content: '2'?
I have tried this:
data:{$pull:{"content": deletions[i]}}
where deletions [i] is the content.
and several variations, but i can not get it to work. What am I missing?

As per you comment, you should be worried. I have seen this a few times particularly with PHP applications ( and PHP has this funny kind of notation in dumping arrays ).
The problem is the elements such as the "0" and "1" create sub-documents in themselves, and as opposed to just leaving everything under that structure as the sub-document in itself as the the array member then you run into a problem with accessing individual members of the array as the paths used need to be "absolute".
So this actually "forces" no other possible option to access the elements by what would be the equivalent "dot notation" form. Except in this case it's not just the "nth" element of the array, but the actual path you need to address.
But if this does actually work for you, and it does seem like "someone" was trying to avoid
the problems with positional updates under "nested" arrays ( see the positional $ operator documentation for details ) then you update can be performed like this:
The basic statement is as follows:
db.collection.update(
{
"data.0.context": 2
},
{
"$pull": { "data.$.0.context": 2 }
}
)
That does "seem" to be a bit of a funny way to write this, but on investigating the actual structure you have then you should be able to see the reasons why this is needed. Essentially this meets the requirements of using the positional $ operator to indicate the index of the first matched element in the array ( aka "data" ) and then uses the standard sub-document notation in order to specify the path to the element to be updated.
So of course this poses a problem if the element is actually in an unknown position. But the thing to consider is which usage of the array is actually important to you given the documented limitation? If yo need to match the position of the "inner" element, then change the structure to put the arrays in there.
But always understand the effects of the limitation, and try to model according to what the engine can actually do.

Batch node relationship creation in cypher/neo4j

What is the most efficient way to break down this CREATE cypher query?
The end pattern is the following:
(newTerm:term)-[:HAS_META]->(metaNode:termMeta)
In this pattern this is a single newTerm node and about ~25 termMeta nodes. The HAS_META relationship will have a single property (languageCode) that will be different for each termMeta node.
In the application, all of these nodes and relationships will be created at the same time. I'm trying to determine the best way to add them.
Is there anyway to add these without having to have perform individual query for each TermMeta node?
I know you can add multiple instances of a node using the following query format:
"metaProps" : [
{"languageCode" : "en", "name" : "1", "dateAdded": "someDate1"},
{"languageCode" : "de", "name" : "2", "dateAdded": "someDate2"},
{"languageCode" : "es", "name" : "3", "dateAdded": "someDate3"},
{"languageCode" : "fr", "name" : "3", "dateAdded": "someDate4"}
]
But you can only do that for one type of node at a time and there (as far as I can tell) is no way to dynamically add the relationship properties that are needed.
Any insight would be appreciated.

There's no really elegant way to do it, as far as I can tell—from your example, I'm assuming you're using parameters. You can use a foreach to loop through the params and do a create on each one, but it's pretty ugly, and requires you to explicitly specify literal maps of your properties. Here's what it would look like for your example:
CREATE (newTerm:term)
FOREACH ( props IN {metaProps} |
CREATE newTerm-[:HAS_META {languageCode: props.languageCode}]->
(:termMeta {name: props.name, dateAdded: props.dateAdded})
)
WITH newTerm
MATCH newTerm-[rel:HAS_META]->(metaNode:termMeta)
RETURN newTerm, rel, metaNode
If you don't need to return the results, you can delete everything after the FOREACH.

Select and name each vertex differently and then create relations using it.
For ex
match (n:Tag), (m:Account), (l:FOO) CREATE (n)-[r:mn]->(m),(m)-[x:ml]->(l)
match (n:Tag{a:"a"}), (m:Account{b:"x"}), (l:FOO) CREATE (n)-[r:mn]->(m),(m)-[x:ml]->(l)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight