I am hoping it is straightforward to do the following:
Given rows containing jsonb of the form
{
'a':"hello",
'b':['jim','bob','kate']
}
I would like to be able to get all the 'b' fields from a table (as in select jsondata->'b' from mytable) and then form a list consisting of all strings which occur in at least one 'b' field. (Basically a set-union.)
How can I do this? Or am I better off using a python script to extract the 'b' entries, do the set-union there, and then store it back into the database somewhere else?
This gives you the union set of elements in list 'b' of the json.
SELECT array_agg(a order by a)
FROM (SELECT DISTINCT unnest(txt_arr) AS a FROM
(SELECT ARRAY(SELECT trim(elem::text, '"')
FROM jsonb_array_elements(jsondata->'b') elem) AS txt_arr
FROM jtest1)y)z;
Query Explanation:
Gets the list from b as jsondata->'b'
Expands a JSON array to a set of JSON values from jsonb_array_elements() function.
Trims the " part in the elements from trim() function.
Converts to an array again using array() function after trimming.
Get the distinct value by unnesting it using unnest() function.
Finally array_agg() is used to form the expected result.
Related
[
{
"SnapshotDate": 20220224,
"EquityUSD": 5530.22,
"BalanceUSD": 25506.95,
"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"
}
]
Can somenone help me to extract this json string on bigquery. I can seem to get JSON_EXTRACT to work as it does not have a root element
The double quotes in jsonTransactions are making the JSON invalid. JSON_EXTRACT_SCALAR(json_data, "$[0].jsonTransactions") returns [{ because the first pair of double quotes enclose [{. To circumvent this, I used regex to remove the double quotes of the jsonTransactions value. Now, the inner JSON string is considered an array.
After regex replacement, the outermost quotes have been removed as shown below. I replaced "[ and ]" with [ and ] respectively in the JSON string.
"jsonTransactions": [{"TransactionDate":20220224,"AccountTransactionID":144155779,"TransactionType":"Deposit","AmountUSD":2000},{"TransactionDate":20220224,"AccountTransactionID":144155791,"TransactionType":"Deposit","AmountUSD":2000}]
Consider the below query for your requirement. The JSON path for AmountUSD will be "$[0].jsonTransactions[0].AmountUSD".
WITH
sample_table AS (
SELECT
'[{"SnapshotDate": 20220224,"EquityUSD": 5530.22,"BalanceUSD": 25506.95,"jsonTransactions": "[{\"TransactionDate\":20220224,\"AccountTransactionID\":144155779,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000},{\"TransactionDate\":20220224,\"AccountTransactionID\":144155791,\"TransactionType\":\"Deposit\",\"AmountUSD\":2000}]"}]'
AS json_data) as json_extracted
SELECT
JSON_EXTRACT(REGEXP_REPLACE(REGEXP_REPLACE(json_data, r'"\[', '['), r'\]"', ']'),
'$[0].jsonTransactions')
FROM
sample_table;
Output:
As you had mentioned in the comments section, it is better to store the JSON itself in a more accessible format (one valid JSON object) instead of nesting JSON strings.
You might have to build a temp table to do this.
This first create statement would take a denormalized table convert it to a table with an array of structs.
The second create statement would take that temp table and embed the array into a (array of) struct(s).
You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.
The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.
CREATE OR REPLACE TABLE dataset.tempTable as (
select
column1,
column2,
column3,
ARRAY_AGG(
STRUCT(
ArrayObjectColumn1,
ArrayObjectColumn2,
ArrayObjectColumn3
)
) as InsideArrayOfStructs
FROM
sourceDataset.sourceTable
GROUP BY
column1,
column2,
column3 )
CREATE OR REPLACE TABLE dataset.finalTable as (
select
column1,
ARRAY_AGG(
STRUCT(
column2,
column3,
InsideArrayOfStructs
)
) as OutsideArrayOfStructs
FROM
dataset.tempTable
GROUP BY
Column1 )
I've been trying for the past hours to look for a way to check in BigQuery if an array contains a certain value without using UNNEST. The reason why I don't want to use UNNEST is that I don't want an UNNEST result, I just want to check if the value is in it or not (and then do a condition CASE WHEN on it).
I've tried different ways like value = ANY(array), CONTAINS, CONTAINS_ARRAY but none of them work on BigQuery.
Thank you!
If the only reason for you not to use UNNEST is the unnested result, I would not leave this option behind. Although, I would suggest you to use UNNEST and do not select the unnested columns. Thus, maintaining your nested result and you will be able to use these temporary new columns to verify your conditions within your CASE WHEN statements.
I have used a public dataset in BigQuery to exemplify this algorithm for you.The syntax is:
WITH
temporary_table AS(
SELECT
*,
param
FROM
`firebase-public-project.analytics_153293282.events_20181003`,
UNNEST(event_params) AS param )
SELECT
*,
CASE
WHEN (param.key IN ('value', 'board')) THEN TRUE
END
AS check
FROM
temporary_table
LIMIT
100;
Notice that the unnested columns from event_param are not displayed in the final result. Also, the column check was created and used as a Boolean which could be omitted and could also be used as flag to make the desired modification to your desired columns.
I hope it helps.
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, [1,2,3] arr UNION ALL
SELECT 2, [4,5]
)
SELECT id, arr,
CASE 1 IN UNNEST(arr)
WHEN TRUE THEN 'valie is in array'
ELSE 'valie is not in array'
END conclusion
FROM `project.dataset.table`
with result
As you can see, result is not unnested!
I am trying to do something that seems simple but cannot find the right syntax for Denodo's VQL (Virtual Query Language). I have a string like this: XXXX-YYYY-ZZZZ-AAAA-BBBB in a column called "location" that varies in length, and I want to get the value of the fourth set (i.e. AAAA in this example). I am using the Denodo split function like this:
SELECT SPLIT("-",location)[3] AS my_variable FROM my_table
However, the [3] doesn't work. I've tried a bunch of variations:
SELECT SPLIT("-",location)[3].value AS my_variable FROM my_table
SELECT SPLIT("-",location).column3 AS my_variable FROM my_table
etc.
Can someone please help me figure out the right syntax to return a single parameter from an array? Thank you!
SELECT field_1[3].string
FROM (SELECT split('-', 'XXXX-YYYY-ZZZZ-AAAA-BBBB') as field_1)
You have to do it using a subquery because the syntax to access the element of an array (that is, [<number>]) can only be used with field names. You cannot use something like [4] next to the result of a expression.
This question helps: https://community.denodo.com/answers/question/details?questionId=90670000000CcQPAA0
I got it working by creating a view that saves the array as a field:
CREATE OR REPLACE VIEW p_sample_data FOLDER = '/stack_overflow'
AS SELECT bv_sample_data.location AS location
, bv_sample_data.id AS id
, split('-', location) AS location_array
FROM bv_sample_data;
Notice I created a column called location_array?
Now you can use a select statement on top of your view to extract the information you want:
SELECT location, id, location_array[2].string
FROM p_sample_data
location_array[2] is the 3rd element, and the .string tells denodo you want the string value (I think that's what it does... you'd have to read more about Compound Values in the documentation: https://community.denodo.com/docs/html/browse/6.0/vdp/vql/advanced_characteristics/management_of_compound_values/management_of_compound_values )
Another way you could probably do it is by creating a view with the array, and then flattening the array, although I haven't tried that option.
Update: I tried creating a view that flattens the array, and then using an analytics (or "window") function to get a row_number() OVER (PARTITION BY id order by ID ASC), but analytic/window functions don't work against flat file sources.
So if you go the "flatten" route and your source system doesn't work with analytic fuctions, you could just go with a straight rownum() function, but you'd have to offset the value by column number you want, and then use remainder division to pull out the data you want.
Like this:
--My view with the array is called p_sample_data
CREATE OR REPLACE VIEW f_p_sample_data FOLDER = '/stack_overflow' AS
SELECT location AS location
, id AS id
, string AS string
, rownum(2) AS rownumber
FROM FLATTEN p_sample_data AS v ( v.location_array);
Now, with the rownum() function (and an offset of 2), I can use remainder division in my where clause to get the data I want:
SELECT location, id, string, rownumber
FROM f_p_sample_data
WHERE rownumber % 5 = 0
Frankly, I think the easier way is to just leave your location data in the array and extract out the nth column with the location_array[2].string syntax, where 2 is the nth column, zero based.
I have a temp table having two columns - key and value:
temp_tbl:
key value
---|-----
k1 | a','b
Below is the insert script with which I am storing the value in temp_tbl:
insert into temp_tbl values ('k1', 'a'+char(39)+char(44)+char(39)+'b');
Now, I want trying to fetch records from another table (actual_tbl) like this:
select * from actual_tbl where field_value in
(select value from tamp_tbl where key = 'k1');--query 1
But this is not returning anything.
I want the above query to behave like the following one:
select * from actual_tbl where field_value in
('a','b');--query 2
Where am I doing wrong in query 1?
I am using sql server.
Where am I doing wrong in query 1?
Where you are going wrong is in failing to understand the way the IN keyword works with a subquery vs a hard-coded list.
When an IN clause is followed by a list, each item in the list is a discrete value:
IN ('I am a value', 'I am another value', 'I am yet another value')
When it's followed by a sub-query, each row generates a single value. Your temp table only has one row, so the IN clause is only considering a single value. No matter how you try to "trick" the parser with commas and single-quotes, it won't work. The SQL Server parser is too smart to be tricked. It will know that a single value of 'a','b' is still just a single value, and it will look for that single value. It won't treat them as two separate values like you are trying to do.
I have a table parameter having 2 columns id(integer) and param_specs(text).
the actual param_specs column looks like above pic (to simplify it check below:-
)
param_specs
[
{"paramName":"param1",
"type":"string",
"defaultValue":"tomcat7",
"optional":false,
"deploymentParam":false},
{"paramName":"param123PreStopAction",
"type":"path",
"defaultValue":"HELLO",
"optional":false,
"deploymentParam":false}
]
So it is an array of json array and i want to fetch the defaultValue field of paramName param123PreStopAction i.e. HELLO.
****EDIT****
As can be seen in the image this is what my table called parameter looks like having two columns I want to get defaultValue of each row in parameter table where paramName LIKE (%PostStopAction) or (%PreStopAction) check the bold values in image(i.e. the paramName should have either PreStopAction or PostStopAction within the actual paramName value eg 'mytomcat7PostStopAction' and fetch its defaultValue i.e 'post-stop'.)
There can be some rows in the table where there wont be any json having preStop or PostStop paramName like row 3 in the image
can someone help me with the query?
As JGH suggested something as follows:-
SELECT "defaultValue"
FROM parameter a
CROSS JOIN LATERAL
json_to_recordset(a.param_spec::json) AS x("paramName" text,"defaultValue" text)
WHERE "paramName”LIKE “%PreStopAction' OR “paramName” LIKE “%PostStopAction”
One approach is to explode your array in fields and to query them. The trick is to consider only the fields of interest.
Select myOutputField
from json_to_recordset('[the array]') as (myQueryField text, myOutputField text)
where myQueryField = myCondition;
Or, bound to your example:
select "defaultValue" from json_to_recordset('
[
{"paramName":"param1",
"type":"string",
"defaultValue":"tomcat7",
"optional":false,
"deploymentParam":false},
{"paramName":"param123PreStopAction",
"type":"path",
"defaultValue":"HELLO",
"optional":false,
"deploymentParam":false}
]') as x("paramName" text,"defaultValue" text)
where "paramName" = 'param123PreStopAction';
** EDIT **
Your data is not saved in a json column but in a text column. You would have to convert it to json (ideally, the column itself... or at least its content). Also, the json_to_recordset works on single items, not on sets, so you would need to use a LATERAL JOIN to overcome this limitation, as nicely explained here.
SELECT myOutputField
FROM mytable a
CROSS JOIN LATERAL
json_to_recordset(a.jsonintextcolumn::json) as (myQueryField text, myOutputField text)
WHERE myQueryField = myCondition;
Or, bound to your example:
SELECT "defaultValue"
FROM public.testjsontxt a
CROSS JOIN LATERAL
json_to_recordset(a.param_specs::json) as x("paramName" text,"defaultValue" text)
WHERE "paramName" = 'param123PreStopAction';