json / jq : multi-level grouping of sub-elements in an array - arrays

i'm writing a script that needs to parse incoming json into line-by-line data, taking information from the json at multiple different levels. i'm using jq to parse the data.
the incoming json is an array of 'tasks'. each task [i.e. each element of the array] is an object that looks like this :
{
"inputData": {
"transfers": [
{
"source": {
"directory": "/path/to/source",
"filename": "somefile.mp3"
},
"target": {
"directory": "/path/to/target",
"filename": "somefile.mp3"
}
},
{
"source": {
"content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><delivery>content description</delivery>",
"encoding": "UTF-8"
},
"target": {
"directory": "/path/to/target",
"filename": "somefile.xml"
}
}
]
},
"outputData": {
"transferDuration": "00:00:37:10",
"transferLength": 187813298,
},
"updateDate": "2020-02-21T14:37:18.329Z",
"updateUser": "bob"
}
i want to read all of the tasks and, for each one, output a single line composed of the following fields :
task[n].inputData.transfers[].target.filename, task[n].outputData.transferLength, task[n].updateDate
i've got my filter chain to where it will choose the appropriate fields correctly, even to where it will pick the 'correct' single entry from amongst the multiple entries in the task[].inputData.transfers[] array, but when i try to get the output of more than a single element, the chain iterates over the array three times, and i get
task[0].inputData.transfers[].target.filename
task[1].inputData.transfers[].target.filename
task[2].inputData.transfers[].target.filename
...
task[n].inputData.transfers[].target.filename
then the results of the outputData.transferLength field for all elements,
then the results of the updateDate field for all elements.
here is my filter chain :
'(.tasks[].inputData.transfers[] | select(.target.filename | match("[Xx][Mm][Ll]$")).target.filename), .tasks[].outputData.transferLength, .tasks[].updateDate'
i'm thinking there must be some efficient way to group all of these multi-level elements together for each element of the array ; something like a 'with ...' clause, like with tasks[] : blablabla, but can't figure out how to do it. can anyone help ?

The JSON example contained a superfluous , that jq won't accept.
Your example filter chain appears to operate on .tasks[] even though the example appears to be only a single task. So it is not possible to rewrite what you have got into a functioning state. So rather than provide an exact answer to an inexact question, here is the first of the three parts of your filter chain revised:
.inputData.transfers | map(select(.target.filename | match("xml$"; "i")))
See this jqplay snippet.
Rather than write [ .xs[] | select(p) ], just write .xs | map(select(p)).

i finally found the answer. the trick was to pipe the .tasks[] into an expression where the parens were placed around the field elements as a group, which apparently will apply whatever is inside the parens to each element of the array individually, in sequence. then using #dmitry example as a guide, i also placed the elements inside right and left brackets to recreate array elements that i could then select, which could then be output onto 1 line each with | #csv. so the final chain that worked for me is :
.task[] | ([.inputData.transfers[].target.filename, .outputData.transferLength, .updateDate]) | [(.[0],.[2],.[3])] | #csv'
unfortunately i couldn't get match() to work in this invocation, nor sub() ; each of these caused jq to offer a useless error message just before it dumped core.
many thanks to the people who replied.

Related

A way to bring facet results together based on common id

I'm doing a mongodb aggregation with two facets. Each facet is a different operation performed on the same collection. Each facet's results had two fields per object; the id and the operation result. I want to combine each facet's results based on the common id.
The desired result is like this:
[
{
"id":"1",
"bind":"xxx",
"pres":"xxx"
},
{
"id":"2",
......
}
]
I would like unfound areas to be zero or not be included if that is supported.
I've started with
const combined_agg = [
{
"$facet":{
"bind":opp_bind,
"pres":opp_pres,
}
}
Where the two opp are the variables for the two operations. The above gives me:
[
{
"bind":
[
{"binding":6,"id":"xxxx"},
....
],
"pres":
[
{"presenting":4,"id":"xxxx"},
....
]
}
]
From here, I am running into trouble.
I have tried to concatenate the arrays with
{
"$project":{"result":{"$concatArrays":["$bind","$pres"]}}
}
which gives me one object with one large array. I tried to $unwind that large array so I objects are at the root but unwind only gives me the first 20 items of the array.
I tried using $group within the result array, but that gives me an id field with an array of all the ids and two other fields with arrays of their values.
{
"$group":{
"_id":"$result.id",
"fields":{
"$push":{"bind":"$result.bind","pres":"$result.pres"}
}
}
}
I don't know how to separate them out so I can recombine them. I also saw some somewhat similar problems using map but I couldn't wrap my head around it.
I was able to figure out how to do it. I used lookup with a pipeline to get the right format.
Lookup added the result to every object of the original query. Then I used project and filter to find the correct value from the second query. Then I used addFields and arrayElementAt to get the value I wanted along with another project to get only the values I needed. It wasn't very pretty though.

How to get the first item using JSONPath resulting array?

I have a JSON similar to:
{
"orders":{
"678238": {
"orderId": 678238,
"itemName": "Keyboard"
},
"8723423": {
"orderId": 8723423,
"itemName": "Flash Drive"
}
}
}
I am trying JSON path to get first orderId. When I try $..orderId I get an array listing both orderId, then I tried $..[0].orderId to get first item from that array (following JsonPath - Filter Array and get only the first element). But it does not work. I am confused.
try this
console.log(jsonPath(json,"$['orders'].[orderId]")[0]); //678238
You're almost there. You need to combine the two things you've done.
$..orderId[0]
The ..orderId recursively searches for orderId properties, giving you all of their values, as you mentioned. Taking that result, you just need to apply the [0].
Be careful, though. Because your data is an object, keys are unordered, so the first one in the JSON text may not be the first one encountered in memory. You'll want to do some testing to confirm your results are consistent with your expectations.
Your JSON doesn't even have an array and you are expecting to get first item from the array which is why it's not working.
Suppose, if the structure of JSON is modified like this
{
"orders": [{
"orderId": 678238,
"itemName": "Keyboard"
},
{
"orderId": 8723423,
"itemName": "Flash Drive"
}
]
}
then you can use the query to get the first order.
$.orders[0].orderId

jq: delete element from array

I have this JSON file and want to delete an element from an array:
{
"address": "localhost",
"name": "local",
"vars": {
"instances": [
"one",
"two"
]
}
}
I am using this command:
jq 'del(.vars.instances[] | select(index("one")))' data.json
The output is:
{
"address": "localhost",
"name": "local",
"vars": {
"instances": [
"two"
]
}
}
So it works as expected, but only with jq v1.6. With jq v1.5 I get this error:
jq: error (at data.json:20): Invalid path expression near attempt to access element 0 of [0]
So what am I doing wrong? Is this a bug or a feature of v1.5? Is there any workaround to get the same result in v1.5?
Thanks in advance
Vince
One portable to work with on both versions would be,
.vars.instances |= map(select(index("one")|not))
or if you want to still use del(), feed the index of the string "one" to the function as below, where index("one") gets the index 0 which then gets passed to delete as del(.[0]) meaning to delete the element at zeroth index.
.vars.instances |= del(.[index("one")])
The implementation of del/1 has proven to be quite difficult and indeed it changed between jq 1.5 and jq 1.6, so if portability across different versions of jq is important, then usage of del/1 should either be restricted to the least complicated cases (e.g., no pipelines) or undertaken with great care.

Turning Array Into String in SnapLogic

I have the output of a SalesForce SOQL snap that is a JSON in this format.
[
{
"QualifiedApiName": "Accelerator_Pack__c"
},
{
"QualifiedApiName": "Access_Certifications__c"
},
{
"QualifiedApiName": "Access_Requests__c"
},
{
"QualifiedApiName": "Account_Cleansed__c"
},
{
"QualifiedApiName": "Account_Contract_Status__c"
}
]
I am attempting to take those values and turn them into a string with the values separated by commas, like this, so that I can use that in the SELECT clause of another query.
Accelerator_Pack__c, Access_Certifications__c, Access_Requests__c, Account_Cleansed__c, Account_Contract_Status__c
From the documentation, my understanding was that .toString() would convert the array into a comma-separated string, but as shown in the attached image, it isn't doing anything. Does anyone have experience with this?
You need to aggregate the incoming documents.
Use the Aggregate snap with the function CONCAT. This will give you a | delimited concatenated string as the output like as follows.
Accelerator_Pack__c|Access_Certifications__c|Access_Requests__c|Account_Cleansed__c|Account_Contract_Status__c
You can then replace the | with , like $concatenated_fields.split('|').join(',') or $concatenated_fields.replace(/\|/g, ',').
Following is a detailed explanation of the configuration.
Sample Pipeline:
Sample Input:
I set the sample JSON you provided in a JSON Generator for testing.
Aggregation:
Result of Aggregation:
You get a | delimited concatenated string.
Mapper Expression:
Output:
Both expressions give the same result.
You can also use the array functions directly to achieve this. see the below pipeline that can be used to concat the values:
I have used the JSONGenerator for taking your sample data as input.
Then I have used the GroupByN snap with '0' as the group size to formulate the array.
Finally in the mapper you can use the below expression to concat:
jsonPath($, "$arrayAccom[*].QualifiedApiName").join(",")

Update Item In Nested Array Using Mongoose [duplicate]

I am trying to update a value in the nested array but can't get it to work.
My object is like this
{
"_id": {
"$oid": "1"
},
"array1": [
{
"_id": "12",
"array2": [
{
"_id": "123",
"answeredBy": [], // need to push "success"
},
{
"_id": "124",
"answeredBy": [],
}
],
}
]
}
I need to push a value to "answeredBy" array.
In the below example, I tried pushing "success" string to the "answeredBy" array of the "123 _id" object but it does not work.
callback = function(err,value){
if(err){
res.send(err);
}else{
res.send(value);
}
};
conditions = {
"_id": 1,
"array1._id": 12,
"array2._id": 123
};
updates = {
$push: {
"array2.$.answeredBy": "success"
}
};
options = {
upsert: true
};
Model.update(conditions, updates, options, callback);
I found this link, but its answer only says I should use object like structure instead of array's. This cannot be applied in my situation. I really need my object to be nested in arrays
It would be great if you can help me out here. I've been spending hours to figure this out.
Thank you in advance!
General Scope and Explanation
There are a few things wrong with what you are doing here. Firstly your query conditions. You are referring to several _id values where you should not need to, and at least one of which is not on the top level.
In order to get into a "nested" value and also presuming that _id value is unique and would not appear in any other document, you query form should be like this:
Model.update(
{ "array1.array2._id": "123" },
{ "$push": { "array1.0.array2.$.answeredBy": "success" } },
function(err,numAffected) {
// something with the result in here
}
);
Now that would actually work, but really it is only a fluke that it does as there are very good reasons why it should not work for you.
The important reading is in the official documentation for the positional $ operator under the subject of "Nested Arrays". What this says is:
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
Specifically what that means is the element that will be matched and returned in the positional placeholder is the value of the index from the first matching array. This means in your case the matching index on the "top" level array.
So if you look at the query notation as shown, we have "hardcoded" the first ( or 0 index ) position in the top level array, and it just so happens that the matching element within "array2" is also the zero index entry.
To demonstrate this you can change the matching _id value to "124" and the result will $push an new entry onto the element with _id "123" as they are both in the zero index entry of "array1" and that is the value returned to the placeholder.
So that is the general problem with nesting arrays. You could remove one of the levels and you would still be able to $push to the correct element in your "top" array, but there would still be multiple levels.
Try to avoid nesting arrays as you will run into update problems as is shown.
The general case is to "flatten" the things you "think" are "levels" and actually make theses "attributes" on the final detail items. For example, the "flattened" form of the structure in the question should be something like:
{
"answers": [
{ "by": "success", "type2": "123", "type1": "12" }
]
}
Or even when accepting the inner array is $push only, and never updated:
{
"array": [
{ "type1": "12", "type2": "123", "answeredBy": ["success"] },
{ "type1": "12", "type2": "124", "answeredBy": [] }
]
}
Which both lend themselves to atomic updates within the scope of the positional $ operator
MongoDB 3.6 and Above
From MongoDB 3.6 there are new features available to work with nested arrays. This uses the positional filtered $[<identifier>] syntax in order to match the specific elements and apply different conditions through arrayFilters in the update statement:
Model.update(
{
"_id": 1,
"array1": {
"$elemMatch": {
"_id": "12","array2._id": "123"
}
}
},
{
"$push": { "array1.$[outer].array2.$[inner].answeredBy": "success" }
},
{
"arrayFilters": [{ "outer._id": "12" },{ "inner._id": "123" }]
}
)
The "arrayFilters" as passed to the options for .update() or even
.updateOne(), .updateMany(), .findOneAndUpdate() or .bulkWrite() method specifies the conditions to match on the identifier given in the update statement. Any elements that match the condition given will be updated.
Because the structure is "nested", we actually use "multiple filters" as is specified with an "array" of filter definitions as shown. The marked "identifier" is used in matching against the positional filtered $[<identifier>] syntax actually used in the update block of the statement. In this case inner and outer are the identifiers used for each condition as specified with the nested chain.
This new expansion makes the update of nested array content possible, but it does not really help with the practicality of "querying" such data, so the same caveats apply as explained earlier.
You typically really "mean" to express as "attributes", even if your brain initially thinks "nesting", it's just usually a reaction to how you believe the "previous relational parts" come together. In reality you really need more denormalization.
Also see How to Update Multiple Array Elements in mongodb, since these new update operators actually match and update "multiple array elements" rather than just the first, which has been the previous action of positional updates.
NOTE Somewhat ironically, since this is specified in the "options" argument for .update() and like methods, the syntax is generally compatible with all recent release driver versions.
However this is not true of the mongo shell, since the way the method is implemented there ( "ironically for backward compatibility" ) the arrayFilters argument is not recognized and removed by an internal method that parses the options in order to deliver "backward compatibility" with prior MongoDB server versions and a "legacy" .update() API call syntax.
So if you want to use the command in the mongo shell or other "shell based" products ( notably Robo 3T ) you need a latest version from either the development branch or production release as of 3.6 or greater.
See also positional all $[] which also updates "multiple array elements" but without applying to specified conditions and applies to all elements in the array where that is the desired action.
I know this is a very old question, but I just struggled with this problem myself, and found, what I believe to be, a better answer.
A way to solve this problem is to use Sub-Documents. This is done by nesting schemas within your schemas
MainSchema = new mongoose.Schema({
array1: [Array1Schema]
})
Array1Schema = new mongoose.Schema({
array2: [Array2Schema]
})
Array2Schema = new mongoose.Schema({
answeredBy": [...]
})
This way the object will look like the one you show, but now each array are filled with sub-documents. This makes it possible to dot your way into the sub-document you want. Instead of using a .update you then use a .find or .findOne to get the document you want to update.
Main.findOne((
{
_id: 1
}
)
.exec(
function(err, result){
result.array1.id(12).array2.id(123).answeredBy.push('success')
result.save(function(err){
console.log(result)
});
}
)
Haven't used the .push() function this way myself, so the syntax might not be right, but I have used both .set() and .remove(), and both works perfectly fine.

Resources