Is there a BigQuery function to extract a nested JSON? - arrays

[
{
"SnapshotDate": 20220224,
"EquityUSD": 5530.22,
"BalanceUSD": 25506.95,
"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"
}
]
Can somenone help me to extract this json string on bigquery. I can seem to get JSON_EXTRACT to work as it does not have a root element

The double quotes in jsonTransactions are making the JSON invalid. JSON_EXTRACT_SCALAR(json_data, "$[0].jsonTransactions") returns [{ because the first pair of double quotes enclose [{. To circumvent this, I used regex to remove the double quotes of the jsonTransactions value. Now, the inner JSON string is considered an array.
After regex replacement, the outermost quotes have been removed as shown below. I replaced "[ and ]" with [ and ] respectively in the JSON string.
"jsonTransactions": [{"TransactionDate":20220224,"AccountTransactionID":144155779,"TransactionType":"Deposit","AmountUSD":2000},{"TransactionDate":20220224,"AccountTransactionID":144155791,"TransactionType":"Deposit","AmountUSD":2000}]
Consider the below query for your requirement. The JSON path for AmountUSD will be "$[0].jsonTransactions[0].AmountUSD".
WITH
sample_table AS (
SELECT
'[{"SnapshotDate": 20220224,"EquityUSD": 5530.22,"BalanceUSD": 25506.95,"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"}]'
AS json_data) as json_extracted
SELECT
JSON_EXTRACT(REGEXP_REPLACE(REGEXP_REPLACE(json_data, r'"\[', '['), r'\]"', ']'),
'$[0].jsonTransactions')
FROM
sample_table;
Output:
As you had mentioned in the comments section, it is better to store the JSON itself in a more accessible format (one valid JSON object) instead of nesting JSON strings.

You might have to build a temp table to do this.
This first create statement would take a denormalized table convert it to a table with an array of structs.
The second create statement would take that temp table and embed the array into a (array of) struct(s).
You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.
The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.
CREATE OR REPLACE TABLE dataset.tempTable as (
select
column1,
column2,
column3,
ARRAY_AGG(
STRUCT(
ArrayObjectColumn1,
ArrayObjectColumn2,
ArrayObjectColumn3
)
) as InsideArrayOfStructs
FROM
sourceDataset.sourceTable
GROUP BY
column1,
column2,
column3 )
CREATE OR REPLACE TABLE dataset.finalTable as (
select
column1,
ARRAY_AGG(
STRUCT(
column2,
column3,
InsideArrayOfStructs
)
) as OutsideArrayOfStructs
FROM
dataset.tempTable
GROUP BY
Column1 )

Related

Copy data from one table to another with an array of structs in BigQuery

We are trying to copy data from one table to another using an INSERT INTO ... SELECT statement.
Our original table schema is as follows, with several columns including a repeated record containing 5 structs of various data types:
original table schema
We want an exact copy of this table, plus 3 new regular columns, so made an empty table with the new schema. However when using the following code the input table ends up with fewer rows overall than the original table.
insert into input_table
select column1, column2, null as newcolumn1, null as newcolumn2, null as newcolumn3,
array_agg(struct (arr.struct1, arr.struct2, arr.struct3, arr.struct4, arr.struct5)) as arrayname, column3
from original_table, unnest(arrayname) as arr
group by column1, column2, column3;
We tried the solution from this page: How to copy data from one table into another table which has a record repeated column in GCP Bigquery
but the query would error as it would treat the 5 structs within the array as arrays themselves (data type = eg. string, mode = repeated, rather than nullable/required).
The error we see says that our repeated record column "has type ARRAY<STRUCT<struct1name ARRAY, struct2name ARRAY, struct3name ARRAY, ...>> which cannot be inserted into column summary, which has type ARRAY<STRUCT<struct1name STRING, struct2name STRING, struct3name STRING, ...>> at [4:1]"
Additionally, a query to find rows that exist in the original but not in the input table returns no results.
We also need the columns in this order (cannot do a simple copy of the table and add the 3 new columns at the end).
Why are we losing rows when using the above code to do an insert into... select?
Is there a way to copy over the data in this way and retain the exact number of rows?

Snowflake Parsing Unnamed Json Array in Table

I am having great difficulty in using Snowflake to parse some JSON data, I have an unnamed array in one of my tables and want to break it apart as part of a query
[{"CodeName":"443","CodeQuantity":6}]
[{"CodeName":"550","CodeQuantity":4}]
[{"CodeName":"293","CodeQuantity":1},{"CodeName":"294","CodeQuantity":3}]
My Query is this
SELECT CODES
FROM CODETABLE
I am having problems parsing the json to split the codename / codequantity into individual elements and rows.
If those are each separate records, stored as varchar, then you simply need to use the parse_json() function to make them into json before flattening and parsing:
WITH x AS (
SELECT *
FROM ( VALUES
('[{"CodeName":"443","CodeQuantity":6}]'),
('[{"CodeName":"550","CodeQuantity":4}]'),
('[{"CodeName":"293","CodeQuantity":1},{"CodeName":"294","CodeQuantity":3}]')
) x (varchar_data)
)
SELECT y.value:CodeName::number,
y.value:CodeQuantity::varchar
FROM x,
LATERAL FLATTEN (input=>parse_json(varchar_data)) y;

How to extract a particular element from a json array in postgres?

I have a table parameter having 2 columns id(integer) and param_specs(text).
the actual param_specs column looks like above pic (to simplify it check below:-
)
param_specs
[
{"paramName":"param1",
"type":"string",
"defaultValue":"tomcat7",
"optional":false,
"deploymentParam":false},
{"paramName":"param123PreStopAction",
"type":"path",
"defaultValue":"HELLO",
"optional":false,
"deploymentParam":false}
]
So it is an array of json array and i want to fetch the defaultValue field of paramName param123PreStopAction i.e. HELLO.
****EDIT****
As can be seen in the image this is what my table called parameter looks like having two columns I want to get defaultValue of each row in parameter table where paramName LIKE (%PostStopAction) or (%PreStopAction) check the bold values in image(i.e. the paramName should have either PreStopAction or PostStopAction within the actual paramName value eg 'mytomcat7PostStopAction' and fetch its defaultValue i.e 'post-stop'.)
There can be some rows in the table where there wont be any json having preStop or PostStop paramName like row 3 in the image
can someone help me with the query?
As JGH suggested something as follows:-
SELECT "defaultValue"
FROM parameter a
CROSS JOIN LATERAL
json_to_recordset(a.param_spec::json) AS x("paramName" text,"defaultValue" text)
WHERE "paramName”LIKE “%PreStopAction' OR “paramName” LIKE “%PostStopAction”
One approach is to explode your array in fields and to query them. The trick is to consider only the fields of interest.
Select myOutputField
from json_to_recordset('[the array]') as (myQueryField text, myOutputField text)
where myQueryField = myCondition;
Or, bound to your example:
select "defaultValue" from json_to_recordset('
[
{"paramName":"param1",
"type":"string",
"defaultValue":"tomcat7",
"optional":false,
"deploymentParam":false},
{"paramName":"param123PreStopAction",
"type":"path",
"defaultValue":"HELLO",
"optional":false,
"deploymentParam":false}
]') as x("paramName" text,"defaultValue" text)
where "paramName" = 'param123PreStopAction';
** EDIT **
Your data is not saved in a json column but in a text column. You would have to convert it to json (ideally, the column itself... or at least its content). Also, the json_to_recordset works on single items, not on sets, so you would need to use a LATERAL JOIN to overcome this limitation, as nicely explained here.
SELECT myOutputField
FROM mytable a
CROSS JOIN LATERAL
json_to_recordset(a.jsonintextcolumn::json) as (myQueryField text, myOutputField text)
WHERE myQueryField = myCondition;
Or, bound to your example:
SELECT "defaultValue"
FROM public.testjsontxt a
CROSS JOIN LATERAL
json_to_recordset(a.param_specs::json) as x("paramName" text,"defaultValue" text)
WHERE "paramName" = 'param123PreStopAction';

Postgresql jsonb set-union of lists

I am hoping it is straightforward to do the following:
Given rows containing jsonb of the form
{
'a':"hello",
'b':['jim','bob','kate']
}
I would like to be able to get all the 'b' fields from a table (as in select jsondata->'b' from mytable) and then form a list consisting of all strings which occur in at least one 'b' field. (Basically a set-union.)
How can I do this? Or am I better off using a python script to extract the 'b' entries, do the set-union there, and then store it back into the database somewhere else?
This gives you the union set of elements in list 'b' of the json.
SELECT array_agg(a order by a)
FROM (SELECT DISTINCT unnest(txt_arr) AS a FROM
(SELECT ARRAY(SELECT trim(elem::text, '"')
FROM jsonb_array_elements(jsondata->'b') elem) AS txt_arr
FROM jtest1)y)z;
Query Explanation:
Gets the list from b as jsondata->'b'
Expands a JSON array to a set of JSON values from jsonb_array_elements() function.
Trims the " part in the elements from trim() function.
Converts to an array again using array() function after trimming.
Get the distinct value by unnesting it using unnest() function.
Finally array_agg() is used to form the expected result.

How do I build XQuerys to include/exclude rows based on the presence of certain tags and attributes?

I have an auditing/logging system that uses raw XML to represent actions taken out by an application. I'd like to improve on this system greatly by using an XML column in a table in the application's SQL Server database.
Each row in the table would contain one log entry and each entry should contain one or more tags that are used to describe the action in a semantic fashion that allows me to search in ways that match the auditing needs of the application, example:
<updateInvoice id="5" userId="7" /><fieldUpdate name="InvoiceDate" /><invoice /><update />
<deleteInvoice id="5" userId="6" /><invoice /><delete />
My intention is to return rowsets from this table by specifying combinations of tags and attributes to include or exclude rows by (e.g. "Include all rows with the tag invoice but exclude rows with the attribute userId='7'", or "Include all rows with the tag invoice but exclude rows with the tag delete)
I wish to do so programatically by using combinations of a simple filter structure to represent combinations of tags and attributes that I want to cause rows to be either included or excluded.
The structure I use looks like this:
enum FilterInclusion { Include, Exclude };
public struct Filter
{
FilterInclusion Inclusion;
string TagName;
string? AttributeName;
object? AttributeValue;
}
My goal is to accept a set of these and generate a query that returns any rows that match any single inclusion filter, without matching any single exclusion filter.
Should I and can I encode this boolean logic into the resulting XPath itself, or am I looking at having multiple SELECT statements in my outputted queries? I'm new to XQuery and any help is appreciated. Thanks!
I'm not sure if that's what you're looking for, but to filter nodes in XML methods you use the brackets [ and ]. For instance to select the elements foo but filter to only those that have the attribute bar you'd use an XPath like /foo[#bar]. If you want those that have the attribute #bar with value 5 you use /foo[#bar=5]. If you want to select the elements foo that have a child element bar you use /foo[bar].
declare #t table (x xml);
insert into #t (x) values
(N'<foo bar="abc"/>');
insert into #t (x) values
(N'<foo bar="5"/>');
insert into #t (x) values
(N'<foo id="1"><bar id="2"/></foo>');
select * from #t;
select c.value(N'#bar', N'varchar(max)')
from #t cross apply x.nodes(N'/foo[#bar]') t(c)
select c.value(N'#bar', N'varchar(max)')
from #t cross apply x.nodes(N'/foo[#bar=5]') t(c)
select c.value(N'#id', N'int')
from #t cross apply x.nodes(N'/foo[bar]') t(c)
I tried to show the examples on the XML snippets in your post, but there those are too structureless to make useful examples.

Resources