mongoexport csv output last array values - arrays

Inspired by this question in Server Fault
https://serverfault.com/questions/459042/mongoexport-csv-output-array-values
I'm using mongoexport to export some collections into CSV files, however when I try to target fields which are the last members of an array I cannot get it to export correctly.
Command I'm using
mongoexport -d db -c collection -fieldFile fields.txt --csv > out.csv
One item of my collection:
{
"id": 1,
"name": "example",
"date": [
{"date": ""},
{"date": ""},
],
"status": [
"true",
"false",
],
}
I can access to the first member of my array writing the fields like the following
name
id
date.0.date
status.0
Is there a way to acess the last item of my array without knowing the lenght of the array?
Because the following doesn't work:
name
id
date.-1.date
status.-1
Any idea of the correct notation? Or if it's simply not possible?

It's not possible to reference the last element of the array without knowing the length of the array, since the notation is array_field.index where the index is in [0, length - 1]. You could use the aggregation framework to create the view of the data that you want to export, save it temporarily into a collection with $out, and then mongoexport that. For example, for your documents you could do
db.collection.aggregate([
{ "$unwind" : "$date" },
{ "$group" : { "_id" : "$_id", "date" : { "$last" : "$date" } } },
{ "$out" : "temp-for-csv" }
])
in order to get just the last date for each document and output it to the collection temp-for-csv.
You can return just the last elements in an array with the $slice projection operator, but this isn't available in aggregation and mongoexport only takes a query specification, not a projection specification, since the --fields and --fieldFile option are supposed to suffice. Might be a good feature request to ask for using a query with a projection for mongoexport.

Related

Power automate: JSON formatting issues

I am using excel to generate a JSON file via a power automate flow. I have most of the functionality working but am stuck on a formatting issue where lists are being outputted as objects.
This is the format I need:
"_source" : { "title" : null, "first_name" : { "value" : "Tony", "source" : "48fa2a08-9137-49fa-8a7d-1d85570b7e5d" }, "last_name" : { "value" : "Stark", "source" : "48fa2a08-9137-49fa-8a7d-1d85570b7e5d" }, "full_name" : "Tony Stark",...
This is the format I am getting:
"_source": { "title": null, "first_name": [ { "value": "Tony", "source": "48fa2a08-9137-49fa-8a7d-1d85570b7e5d" } ], "last_name": [ { "value": "Stark", "source": "48fa2a08-9137-49fa-8a7d-1d85570b7e5d" } ], "full_name": ["Tony Stark"],...
I am composing this in the following way:
How do I get rid of the square brackets?
Any help would be greatly appreciated. TIA
The output result of your data operations for (e.g.) first_name is an array.
In order to remove the square brackets, you need to drill down and get the item you want from the array.
Typically, an array is an array because you potentially expect more than one item to exist but in your case, it looks like that’s all you expect, therefore, you need to use an expression that will get the first item in the array.
To do that, you can wrap the “Output” from the Data Operations step (whatever that action actually is) in a first() expression.
That will return the first object in your array and remove the square brackets.

Replace field with MongoDB aggregate. Why $set, $addFields and $project, doesn't always replace the field?

Passing an object to an existing field in $set or $addFields merges objects rather than replaces them, e.g.
https://mongoplayground.net/p/aXe-rExjCXr
// Collection
[
{
"_id": "123",
"options": {
"size": "Large",
"color": "Red"
}
}
]
// Aggregate
db.collection.aggregate([
{
$set: {
options: {
size: "Small"
}
}
}
]);
// Expect
[
{
"_id": "123",
"options": {
"size": "Small"
}
}
]
// Actual
[
{
"_id": "123",
"options": {
"size": "Small",
"color": "Red" // <-- Not expected?
}
}
]
(It get's even weirder with arrays)
Is it possible to have it behave like non-object values and simply replace the field?
For context, I want to use this in an aggregate update pipeline.
This is the expected behaviour, and as far as i know there is not plan to change, as far as i remembered there was a jira with this, but they closed it, meaning that it will not change i think.
$set/$addFields replace always except
array field and i add document => array with all members that document
document field and i add document => merge documents (this is your case here)
$project replace always except
array field and i add document => array with all members that document
Solutions
You can override this "weird" behaviour especially in case of
arrays, by $unset the old field first for example, and then $set
Based on the jira in the comment bellow, we can also use $literal to avoid this, but when we use $literal we have to be sure that we dont use expressions because they will not be evaluated.
(expressions like path references, variables, operators etc)

BQ load JSON File with Array of Array

Im trying to load a JOSN file where some of the arrays are empty.
{"house_account_payable":"0.00","house_account_receivable":"0.00","gift_sales_payable":"0.00","gift_sales_receivable":"0.00","store_credit_sales_payable":"0.00","percentage_row":null,"sales_per_period":[["02:00AM - 02:59AM",{"amount":0,"qty":0}],["03:00AM - 03:59AM",{"amount":0,"qty":0}]],"revenue_centers":[],"tax_breakdowns":[]}
This is giving the error:
rror while reading table: test2, error message: Failed to parse JSON: No object found when new array is started.; BeginArray returned false; Parser terminated before end of string
Could somebody help me on this?
Are you trying to load data from your local machine or GCS? Please, remember about exporting in JSONL(Newline delimited JSON):
{"open_orders_ids": []}
{"unpaid_orders_ids": []}
The output:
Take a look for documentation about nested and repeated columns.
EDIT:
Your JSON schema should look like this:
{
"items": [
{
"house_account_payable": "0.00",
"house_account_receivable": "0.00",
"gift_sales_payable": "0.00",
"gift_sales_receivable": "0.00",
"store_credit_sales_payable": "0.00",
"percentage_row": "",
"sales_per_period": [
{
"AM02_00_AM02_59": {
"amount": "0",
"qty": "0"
}
},
{
"AM03_00_AM03_59": {
"amount": "0",
"qty": "0"
}
}
]
}
]
}
Regarding to Felipe Hoffa's post, run following commands:
jq -c .items[] <FILENAME>.json > <FILENAME>.jq.json
bq load --source_format NEWLINE_DELIMITED_JSON --autodetect <DATASET_ID>.<TABLENAME> <FILENAME>.jq.json
The schema:
Let me know if this is what you are looking for.
There's no problem with the null arrays.
The problem lies in this shorter json:
{"sales_per_period":[["02:00AM - 02:59AM",{"amount":0,"qty":0}],["03:00AM - 03:59AM",{"amount":0,"qty":0}]]}
The arrays there hold elements of different types, and to bring it into a structured table, a different schema is needed.
For example:
{"sales_per_period":[{"a":"02:00AM - 02:59AM","b":{"amount":0,"qty":0}},{"a":"03:00AM - 03:59AM","b":{"amount":0,"qty":0}}]}
Now this loads easily into BigQuery:
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect temp.short delete.short.json
Can you change this source JSON easily outside BigQuery? Otherwise load it raw into BigQuery, and parse it with a JS UDF inside BigQuery.

Map an 'array of objects' to a simple array of key values

I am new to the mongoDB aggregation pipeline and have a really basic question but could not find the answer anywhere. I would like to simply convert the following block:
"exclude" : [
{
"name" : "Accenture"
},
{
"name" : "Aon Consulting"
}
]
to:
"exclude" : [
"Accenture",
"Aon Consulting"
]
using the aggregation pipeline but I cannot seem to find how to do it even after going through the documentation on https://docs.mongodb.com/manual/reference/operator/aggregation/. Thanks for your help.
While #chridam's answer is correct, there is no need to use $map.
Simple $addFields/$project would be sufficient:
db.collection.aggregate([
{
$addFields: {
exclude : '$exclude.name'
}
}
])
You certainly were in the right direction in using the aggregation framework to handle the transformation. The main operator that maps the object keys in the array to an array of the key values only would be $map in this case.
Use it together within a $addFields pipeline to project the transformed field as follows:
db.collection.aggregate([
{
"$addFields": {
"exclude": {
"$map": {
"input": "$exclude",
"as": "el",
"in": "$$el.name"
}
}
}
}
])
In the above, the $addFields pipeline stage adds new fields to documents and if the name of the new field is the same as an existing field name (including _id), $addFields overwrites the existing value of that field with the value of the specified expression.
So essentially the above replaces the excludes array with the transformed array using $map. $map works by applying an expression to each element of the input array. The expression references each element individually with the variable name ($$el) specified in the as field and returns an array with the applied results.

MongoDB - Using Aggregate to get more than one Matching Object in an Array

I'm trying to do exactly what the poster in this link was trying to accomplish. I have documents with the same structure as the poster; in my documents there is an array of objects, each with many keys. I want to bring back all objects (not just the first, as you can with an $elemMatch) in that array where a key's value matches my query. I want my query's result to simply be an array of objects, where there is a key in each object that matches my query. For example, in the case of the linked question, I would want to return an array of objects where "class":"s2". I would want returned:
"FilterMetric" : [
{
"min" : "0.00",
"max" : "16.83",
"avg" : "0.00",
"class" : "s2"
},
{
"min" : "0.00",
"max" : "16.83",
"avg" : "0.00",
"class" : "s2"
}
]
I tried all the queries in the answer. The first two queries bring back an empty array in robomongo. In the shell, the command does nothing and return me to the next line. Here's a screenshot of robomongo:
On the third query in the answer, I get an unexpected token for the line where "input" is.
I'm using MongoDB version 3.0.2. It appears as if the OP was successful with the answer, so I'm wondering if there is a version issue, or if I'm approaching this the wrong way.
The only problem with the answers in that question seems to be that they're using the wrong casing for FilterMetric. For example, this works:
db.sample.aggregate([
{ "$match": { "FilterMetric.class": "s2" } },
{ "$unwind": "$FilterMetric" },
{ "$match": { "FilterMetric.class": "s2" } },
{ "$group": {
"_id": "$_id",
"FilterMetric": { "$push": "$FilterMetric" }
}}
])

Resources