Array structures querying in presto, hive - database

col-1 has dep_id(varchar) -
112
col-2 has array struct
[
{
"emp_id": 8291828,
"name": "bruce",
},
{
"emp_id": 8291823,
"name": "Rolli",
}
]
I have a use case where i need to flatten and display results. For example when queried data for dep_id - 112 I need to display emp_id in a separate row.
For above data when queried my result should look like
id emp_id
112 8291828
112 8291823
What should be my query format to fetch data?

There are several parts to make this work. First the JSON data will appear as a VARCHAR, so you first need to run json_parse on it to convert it to a JSON type in the engine. Then you can cast JSON types to normal SQL structural types, and in your case this is an array of rows (see cast from JSON). Finally, you do a cross join to the array of rows (which is effectively a nested table). This query fill give you the results you want
WITH your_table AS (
SELECT
112 AS dep_id
, '[{"emp_id": 8291828, "name": "bruce"}, {"emp_id": 8291823, "name": "Rolli"}]' AS data
)
SELECT
dep_id
, r.emp_id
, r.name
FROM your_table
CROSS JOIN
UNNEST(cast(json_parse(data) as array(row (emp_id bigint, name varchar)))) nested_data(r)

Related

BigQuery ARRAY_TO_STRING based on condition in non-array field

I have a table that I query like this...
select *
from table
where productId = 'abc123'
Which returns 2 rows (even though the productId is unique) because one of the columns (orderName) is an Array...
**productId, productName, created, featureCount, orderName**
abc123, someProductName, 2020-01-01, 12, someOrderName
, , , , someOtherOrderName
I'm not sure whether the missing values in the 2nd row are empty strings or nulls because of the way the orderName array expands my search results but I want to now run a query like this...
select productName, ARRAY_TO_STRING(orderName,'-')
from table
where productId = 'abc123'
and ifnull(featureCount,0) > 0
But this query returns...
someProductName, someOrderName-someOtherOrderName
i.e. both array values came back even though I specified a condition of featureCount>0.
I'm sure I'm missing something very basic about how Arrays function in BigQuery but from Google's ARRAY_TO_STRING documentation I don't see any way to add a condition to the extracting of ARRAY values. Appreciate any thoughts on the best way to go about this.
For what I understand, this is because you are just querying one row of data which have a column as ARRAY<STRING>. As you are using ARRAY_TO_STRINGS it will only accept ARRAY<STRING> values you will see all array values fit into just one cell.
So, when you run your script, your output will fit your criteria and return the columns with arrays with additional rows for visibility.
The visualization on the UI should look like your mention in your question:
Row
productId
productName
created
featureCount
orderName
1
abc123
someProductName
2020-01-01
12
someOrderName
someOtherOrderName
Note: On bigquery this additional row is gray out ( ) and Its part of row 1 but it shows as an additional row for visibility. So this output only have 1 row in the table.
And the visualization on a JSON will be:
[
{
"productId": "abc123",
"productName": "someProductName",
"created": "2020-01-01",
"featureCount": "12",
"orderName": [
"someOrderName",
"someOtherOrderName"
]
}
]
I don't think there is specific documentation info about how you visualize arrays on UI but I can share the docs that talks about how to flattening your rows outputs into a single row line, check:
Working with Arrays
Flattening Arrays
I use the following to replicate your issue:
CREATE OR REPLACE TABLE `project-id.dataset.working_table` (
productId STRING,
productName STRING,
created STRING,
featureCount STRING,
orderName ARRAY<STRING>
);
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName)
values ('abc123','someProductName','2020-01-01','12',['someOrderName','someOtherOrderName']);
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName)
values ('abc123X','someProductNameX','2020-01-02','15',['someOrderName','someOtherOrderName','someData']);
output
Row
productId
productName
created
featureCount
orderName
1
abc123
someProductName
2020-01-01
12
someOrderName
someOtherOrderName
2
abc123X
someProductNameX
2020-01-02
15
someOrderName
someOtherOrderName
someData
Note: Table contains 2 rows.

Postgres seach jsonb with indexes

Im new to postgres jsonb operation.
Im storing some data in Postgres with jsonb column, which has flexible metadata as below.
I wanted to search different unique metadata (key:value pairs)
id, type, metadata
1, player, {"name": "john", "height": 180, "team": "xyz"}
2, game, {"name": "afl", "members": 10, "team": "xyz"}
results should be something like below, distinct, order by asc. I wanted it to be efficient using some indexes.
key | value
______________
height 180
members 10
name alf
name john
team xyz
My solution below hit the index for search but sorting and distinct wont hit any indexes as they are processed values from jsonb.
CREATE INDEX metadata_jsonb_each_text_idx ON table
USING GIN (jsonb_pretty(metadata) gin_trgm_ops);
select distinct t, t.*
from table u, jsonb_each_text(u.metadata) t
where jsonb_pretty(u.metadata) like '%key%'
order by t.key, t.value
Appreciate any thoughts on this issue.
Thanks!

SSRS multiple single cells

I am trying to figure out best way to add multiple fields to a SSRS report.
Report has some plots and tablix which are populated from queries but now I have been asked to add a table with ~20 values. The problem is that I need to have them in a specific order/layout (that I cannot obtain by sorting) and they might need to have a description added above which will be static text (not from the DB).
I would like to avoid situation where I keep 20 copy of the same query which returns single cell where the only difference would be in:
WHERE myTable.partID = xxxx
Any chance I could keep a single query which takes that string like a parameter which I could specify somehow via expression or by any other means?
Not a classical SSRS parameter as I need a different one for each cell...
Or will I need to create 20 queries to fetch all those single values and then put them as separate textfields on the report?
When I've done this in the past, I build a single query that gets all the data I need with some kind of key.
For example I might have a list of captions and values, one per row, that I need to display as part of a report page. The dataset query might look something like ...
DECLARE #t TABLE(Key varchar(20), Amount float, Caption varchar(100))
INSERT INTO #t
SELECT 'TotalSales', SUM(Amount), NULL AS Amount FROM myTable WHERE CountryID = #CountryID
UNION
SELECT 'Currency', NULL, CurrencyCode FROM myCurrencyTable WHERE CountryID = #CountryID
UNION
SELECT 'Population', Population, NULL FROM myPopualtionTable WHERE CountryID = #CountryID
SELECT * FROM #t
The resulting dataset would look like this.
Key Amount Caption
'TotalSales' 12345 NULL
'Currency' NULL 'GBP'
'Population' 62.3 NULL
Lets say we call this dataset dsStuff then in each cell/textbox the xpression would simply be something like.
=LOOKUP("Population", Fields!Key.Value, Fields!Amount.Value, "dsStuff")
or
=LOOKUP("Currency", Fields!Key.Value, Fields!Caption.Value, "dsStuff")

Apply OPENJSON to a single column

I have a products table with two attribute column, and a json column. I'd like to be able to delimit the json column and insert extra rows retaining the attributes. Sample data looks like:
ID Name Attributes
1 Nikon {"4e7a":["jpg","bmp","nef"],"604e":["en"]}
2 Canon {"4e7a":["jpg","bmp"],"604e":["en","jp","de"]}
3 Olympus {"902c":["yes"], "4e7a":["jpg","bmp"]}
I understand OPENJSON can convert JSON objects into rows, and key values into cells but how do I apply it on a single column that contains JSON data?
My goal is to have an output like:
ID Name key value
1 Nikon 902c NULL
1 Nikon 4e7a ["jpg","bmp","nef"]
1 Nikon 604e ["en"]
2 Canon 902c NULL
2 Canon 4e7a ["jpg","bmp"]
2 Canon 604e ["en","jp","de"]
3 Olympus 902c ["yes"]
3 Olympus 4e7a ["jpg","bmp"]
3 Olympus 604e NULL
Is there a way I can query this products table like? Or is there a way to reproduce my goal data set?
SELECT
ID,
Name,
OPENJSON(Attributes)
FROM products
Thanks!
Here is something that will at least start you in the right direction.
SELECT P.ID, P.[Name], AttsData.[key], AttsData.[Value]
FROM products P CROSS APPLY OPENJSON (P.Attributes) AS AttsData
The one thing that has me stuck a bit right now is the missing values (value is null in result)...
I was thinking of maybe doing some sort of outer/full join back to this, but even that is giving me headaches. Are you certain you need that? Or, could you do an existence check with the output from the SQL above?
I am going to keep at this. If I find a solution that matches your output exactly, I will add to this answer.
Until then... good luck!
You can get the rows with NULL value fields by creating a list of possible keys and using CROSS APPLY to associate each key to each row from the original dataset, and then left-joining in the parsed JSON.
Here's a working example you should be able to execute as-is:
-- Throw together a quick and dirty CTE containing your example data
WITH OriginalValues AS (
SELECT *
FROM (
VALUES ( 1, 'Nikon', '{"4e7a":["jpg","bmp","nef"],"604e":["en"]}' ),
( 2, 'Canon', '{"4e7a":["jpg","bmp"],"604e":["en","jp","de"]}' ),
( 3, 'Olympus', '{"902c":["yes"], "4e7a":["jpg","bmp"]}' )
) AS T ( ID, Name, Attributes )
),
-- Build a separate dataset that includes all possible 'key' values from the JSON.
PossibleKeys AS (
SELECT DISTINCT A.[key]
FROM OriginalValues CROSS APPLY OPENJSON( OriginalValues.Attributes ) AS A
),
-- Get the existing keys and values from the JSON, associated with the record ID
ValuesWithKeys AS (
SELECT OriginalValues.ID, Atts.[key], Atts.Value
FROM OriginalValues CROSS APPLY OPENJSON( OriginalValues.Attributes ) AS Atts
)
-- Join each possible 'key' value with every record in the original dataset, and
-- then left join the parsed JSON values for each ID and key
SELECT OriginalValues.ID, OriginalValues.Name, KeyList.[key], ValuesWithKeys.Value
FROM OriginalValues
CROSS APPLY PossibleKeys AS KeyList
LEFT JOIN ValuesWithKeys
ON OriginalValues.ID = ValuesWithKeys.ID
AND KeyList.[key] = ValuesWithKeys.[key]
ORDER BY ID, [key];
If you need to include some pre-determined key values where some of them might not exist in ANY of the JSON values stored in Attributes, you could construct a CTE (like I did to emulate your original dataset) or a temp table to provide those values instead of doing the DISTINCT selection in the PossibleKeys CTE above. If you already know what your possible key values are without having to query them out of the JSON, that would most likely be a less costly approach.

Postgresql 9.5 JSONB nested arrays LIKE statement

I have a jsonb column, called "product", that contains a similar jsonb object as the one below. I'm trying to figure out how to do a LIKE statement against the same data in a postgresql 9.5.
{
"name":"Some Product",
"variants":[
{
"color":"blue",
"skus":[
{
"uom":"each",
"code":"ZZWG002NCHZ-65"
},
{
"uom":"case",
"code":"ZZWG002NCHZ-65-CASE"
},
]
}
]}
The following query works for exact match.
SELECT * FROM products WHERE product#> '{variants}' #> '[{"skus":[{"code":"ZZWG002NCHZ-65"}]}]';
But I need to support LIKE statements like "begins with", "ends width" and "contains". How would this be done?
Example: Lets say I want all products returned that have a sku code that begins with "ZZWG00".
You should unnest variants and skus (using jsonb_array_elements()), so you could examine sku->>'code':
SELECT DISTINCT p.*
FROM
products p,
jsonb_array_elements(product->'variants') as variants(variant),
jsonb_array_elements(variant->'skus') as skus(sku)
WHERE
sku->>'code' like 'ZZW%';
Use DISTINCT as you'll have multiple rows as a result of multiple matches in one product.

Resources