How to convert JSON Array of Arrays to columns and rows - arrays

I'm pulling data from an API in JSON with a format like the example data below. Where essentially every "row" is an array of values. The API doc defines the columns and their types in advance. So I know the col1 is, for example, a varchar, and that col2 is an int.
CREATE TEMP TABLE dat (data json);
INSERT INTO dat
VALUES ('{"COLUMNS":["col1","col2"],"DATA":[["a","1"],["b","2"]]}');
I want to transform this within PostgreSQL 9.3 such that I end up with:
col1 | col2
------------
a | 1
b | 2
Using json_array_elements I can get to:
SELECT json_array_elements(data->'DATA')
FROM dat
json_array_elements
json
---------
["a","1"]
["b","2"]
but then I can't figure out how to do either convert the JSON array to a PostgreSQL array so I can perform something like unnest(ARRAY['a','1'])

General case for unknown columns
To get a result like
col1 | col2
------------
a | 1
b | 2
will require a bunch of dynamic SQL, because you don't know the types of the columns in advance, nor the column names.
You can unpack the json with something like:
SELECT
json_array_element_text(colnames, colno) AS colname,
json_array_element_text(colvalues, colno) AS colvalue,
rn,
idx,
colno
FROM (
SELECT
data -> 'COLUMNS' AS colnames,
d AS colvalues,
rn,
row_number() OVER () AS idx
FROM (
SELECT data, row_number() OVER () AS rn FROM dat
) numbered
cross join json_array_elements(numbered.data -> 'DATA') d
) elements
cross join generate_series(0, json_array_length(colnames) - 1) colno;
producing a result set like:
colname | colvalue | rn | idx | colno
---------+----------+----+-----+-------
col1 | a | 1 | 1 | 0
col2 | 1 | 1 | 1 | 1
col1 | b | 1 | 2 | 0
col2 | 2 | 1 | 2 | 1
(4 rows)
You can then use this as input to the crosstab function from the tablefunc module with something like:
SELECT * FROM crosstab('
SELECT
to_char(rn,''00000000'')||''_''||to_char(idx,''00000000'') AS rowid,
json_array_element_text(colnames, colno) AS colname,
json_array_element_text(colvalues, colno) AS colvalue
FROM (
SELECT
data -> ''COLUMNS'' AS colnames,
d AS colvalues,
rn,
row_number() OVER () AS idx
FROM (
SELECT data, row_number() OVER () AS rn FROM dat
) numbered
cross join json_array_elements(numbered.data -> ''DATA'') d
) elements
cross join generate_series(0, json_array_length(colnames) - 1) colno;
') results(rowid text, col1 text, col2 text);
producing:
rowid | col1 | col2
---------------------+------+------
00000001_ 00000001 | a | 1
00000001_ 00000002 | b | 2
(2 rows)
The column names are not retained here.
If you were on 9.4 you could avoid the row_number() calls and use WITH ORDINALITY, making it much cleaner.
Simplified with fixed, known columns
Since you apparently know the number of columns and their types in advance the query can be considerably simplified.
SELECT
col1, col2
FROM (
SELECT
rn,
row_number() OVER () AS idx,
elem ->> 0 AS col1,
elem ->> 1 :: integer AS col2
FROM (
SELECT data, row_number() OVER () AS rn FROM dat
) numbered
cross join json_array_elements(numbered.data -> 'DATA') elem
ORDER BY 1, 2
) x;
result:
col1 | col2
------+------
a | 1
b | 2
(2 rows)
Using 9.4 WITH ORDINALITY
If you were using 9.4 you could keep it cleaner using WITH ORDINALITY:
SELECT
col1, col2
FROM (
SELECT
elem ->> 0 AS col1,
elem ->> 1 :: integer AS col2
FROM
dat
CROSS JOIN
json_array_elements(dat.data -> 'DATA') WITH ORDINALITY AS elements(elem, idx)
ORDER BY idx
) x;

this code worked fine for me, maybe it be useful for someone.
select to_json(array_agg(t))
from (
select text, pronunciation,
(
select array_to_json(array_agg(row_to_json(d)))
from (
select part_of_speech, body
from definitions
where word_id=words.id
order by position asc
) d
) as definitions
from words
where text = 'autumn'
) t
Credits:
https://hashrocket.com/blog/posts/faster-json-generation-with-postgresql

Related

Recursively replace string using a CTE

I am having lookup_table as following:-
| id | Val |
+------+-----+
| 1 | A |
| 11 | B |
| 111 | C |
| 1111 | D |
I am creating words using the values from lookup_table like $id! and saving it into another table.
example : bad - $11!$1!$1111!
So my data_table will be something like,
| expression |
+-----------------+
| $111!$1!$11! | -- cab
| $1111!$1!$1111! | -- dad
| $11!$1!$1111! | -- bad
I want to reverse-build the word from the data_table.
What I've tried: used CHARINDEX on $ and ! to get the first id from expression and tried to replace it with matching val from look_up table recursively using CTE. I was not getting the exact result I was getting, but with some filetring, I got something close.
The query I've tried :
;WITH cte AS
(SELECT replace(expression,'$' + CONVERT(varchar(10),id) + '!' ,val) AS 'expression',
cnt =1
FROM data_table
JOIN lookup_table ON id =
SUBSTRING(
SUBSTRING(expression, CHARINDEX('$', expression) + 1, LEN(expression) - CHARINDEX('$', expression)), 1, CHARINDEX('!', expression) - 2)
UNION ALL
SELECT replace(expression,'$' + CONVERT(varchar(10),id) + '!' ,val) AS 'expression' ,
cnt = cnt +1
FROM cte
JOIN lookup_table ON id =
SUBSTRING(
SUBSTRING(expression, CHARINDEX('$', expression) + 1, LEN(expression) - CHARINDEX('$', expression)), 1, CHARINDEX('!', expression) - (cnt +2))
WHERE CHARINDEX('$', expression) > 0 )
SELECT expression
FROM cte
WHERE CHARINDEX('$', expression) = 0
Current output :
| expression |
+------------+
| DAD |
| CAB |
Expected output:
| expression |
+------------+
| DAD |
| CAB |
| BAD |
fiddle
What am I doing wrong?
Edit: There was a typo in the data setup in fiddler. d in bad was having five 1s instead of four. Thanks, DarkRob for pointing it out.
You may try this. Instead of using recursive cte you may use multiple cte to create your expression. Since sql already introduced this string_split function to convert your row cell into rows on particular delimeter, this will make our work lot easier.
First we convert each cell value into individual rows. Further we can easily get our expression word by using inner join with lookup table. At the last using stuff we'll again get our string as we need.
;WITH CTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY EXPRESSION) AS SLNO, * FROM data_table
),
CT AS (
SELECT *, REPLACE(VALUE,'$','') AS NEWVAL
FROM CTE CROSS APPLY string_split(EXPRESSION,'!') WHERE VALUE <> ''
),
CTFINAL AS (
SELECT * FROM CT INNER JOIN lookup_table AS LT ON CT.NEWVAL=LT.id
)
--SELECT * FROM CTFINAL
SELECT DISTINCT SLNO,
STUFF( (SELECT '' + VAL + '' FROM CTFINAL AS CFF WHERE CFF.SLNO=CF.SLNO FOR XML PATH('')), 1,0,'') AS MYVAL
FROM CTFINAL AS CF

Field equal 1 display

I am using SQL Server 2008 and I would like to only get the activityCode for the orderno when it equals 1 if there are duplicate orderno with the activityCode equals 0.
Also, if the record for orderno activityCode equals 0 then display those records also. But I would only like to display the orderno when the activityCode equals 0 if the same orderno activityCode does not equal 1 or the activityCode only equals 0. I hope this is clear and makes sense but let me know if I need to provide more details. Thanks
--create table
create table po_v
(
orderno int,
amount number,
activityCode number
)
--insert values
insert into po_v values
(170268, 2774.31, 0),
(17001988, 288.82, 0),
(17001988, 433.23, 1),
(170271, 3786, 1),
(170271, 8476, 0),
(170055, 34567, 0)
--Results
170268 | 2774.31 | 0
17001988 | 433.23 | 1
170271 | 3786 | 1
170055 | 34567 | 0
*****Updated*****
I have inserted two new records and the results have been updated. The data in the actual table has other numbers besides 0 and 1. The select statement displays the correct orderno's but I would like the other records for the orderno to display also. The partition only populates one record per orderno. If possible I would like to see the records with the same activityCode.
--insert values
insert into po_v values
(170271, 3799, 1),
(172525, 44445, 2)
--select statement
SELECT Orderno,
Amount,
Activitycode
FROM (SELECT orderno,
amount,
activitycode,
ROW_NUMBER()
OVER(
PARTITION BY orderno
ORDER BY activitycode DESC) AS dup
FROM Po_v)dt
WHERE dt.dup = 1
ORDER BY 1
--select statement results
170055 | 34567 | 0
170268 | 2774.31 | 0
170271 | 3786 | 1
172525 | 44445 | 2
17001988 | 433.23 | 1
--expected results
170055 | 34567 | 0
170268 | 2774.31 | 0
170271 | 3786 | 1
170271 | 3799 | 1
172525 | 44445 | 2
17001988 | 433.23 | 1
Not totally clear what you are trying to do here but this returns the output you are expecting.
select orderno
, amount
, activityCode
from
(
select *
, RowNum = ROW_NUMBER() over(partition by orderno order by activityCode desc)
from po_v
) x
where x.RowNum = 1
---EDIT---
With the new details this is a very different question. As I understand it now you want all row for that share the max activity code for each orderno. You can do this pretty easily with a cte.
with MyGroups as
(
select orderno
, Activitycode = max(activitycode)
from po_v
group by orderno
)
select *
from po_v p
join MyGroups g on g.orderno = p.orderno
and g.Activitycode = p.Activitycode
Try this
SELECT Orderno,
Amount,
Activitycode
FROM (SELECT orderno,
amount,
activitycode,
ROW_NUMBER()
OVER(
PARTITION BY orderno
ORDER BY activitycode DESC) AS dup
FROM Po_v)dt
WHERE dt.dup = 1
ORDER BY 1
Result
Orderno Amount Activitycode
------------------------------------
170055 34567.00 0
170268 2774.31 0
170271 3786.00 1
17001988 433.23 1

SQL EXCEPT Not Working

; WITH cte AS
(SELECT p.BudgetNumber, t.MilestoneNumber FROM
(SELECT DISTINCT BudgetNumber FROM tblMilestones) p
CROSS JOIN
(SELECT DISTINCT MilestoneNumber FROM tblMilestoneTemplate) t)
SELECT BudgetNumber, MilestoneNumber FROM cte
EXCEPT (SELECT BudgetNumber, MilestoneNumber FROM tblMilestones)
ORDER BY BudgetNumber, MilestoneNumber
The query above creates all possible BudgetNumber and MilestoneNumber combinations using a cross join, and then attempts to locate combinations that are not in the tblMilestones table (I didn't create this database, I know the table prefixes are weird and this db isn't normalized).
There are no NULL entries in any of these fields. When I use this query with the EXCEPT clause above, I get some missing values (but not all), but I also get some non-missing values. When I change the EXCEPT to a LEFT JOIN, I get the same results. When I change the EXCEPT to a WHERE NOT EXISTS, I get no results at all. Can anyone please help?
SQLFiddle Output:
| BUDGETNUMBER | MILESTONENUMBER |
|--------------|-----------------|
| BA04001 | 0 |
| BA04001 | 99 |
| BA04005 | 0 |
| BA04005 | 99 |
| BA05001 | 0 |
| BA05001 | 99 |
| BA05002 | 0 |
| BA05002 | 99 |
Here is the way you would need to use NOT EXISTS correctly. You need specify where clause inside subquery to get correct result.
;
WITH cte
AS (
SELECT p.BudgetNumber
,t.MilestoneNumber
FROM (
SELECT DISTINCT BudgetNumber
FROM tblMilestones
) p
CROSS JOIN (
SELECT DISTINCT MilestoneNumber
FROM tblMilestoneTemplate
) t
)
SELECT BudgetNumber
,MilestoneNumber
FROM cte t
WHERE NOT EXISTS ( SELECT 1
FROM tblMilestones s
WHERE t.BudgetNumber = s.BudgetNumber
AND t.MilestoneNumber = s.MilestoneNumber )
ORDER BY BudgetNumber
,MilestoneNumber
Look at following two examples
DECLARE #NoPrecision AS TABLE ( MyNumber DECIMAL )
INSERT INTO #NoPrecision ( MyNumber ) VALUES ( 12345.123456789 )
SELECT * FROM #NoPrecision
output: 12345
DECLARE #Precision AS TABLE ( MyNumber DECIMAL(10,4) )
INSERT INTO #Precision ( MyNumber ) VALUES ( 12345.123456789 )
SELECT * FROM #Precision
output: 12345.1235

Grouping results by test

I have a table with this structure:
Test Value Shape
1 1,89 20
1 2,08 27
1 2,05 12
2 2,01 12
2 2,05 35
2 2,03 24
I need a column for each Test value, in this case, something like this:
Test 1 | Test 2
Value | Shape | Value | Shape
I tried to do this with pivot, but the results wasn't good. Can someone help me?
[]'s
There are a few different ways that you can get the result since you are using SQL Server. In order to get the result, you will first need to create a unique value that will allow you return multiple rows for each Test. I would apply a windowing function like row_number():
select test, value, shape,
row_number() over(partition by test
order by value) seq
from yourtable
This query will be used as the base for the rest of your process. This creates a unique sequence for each test and then when you apply the aggregate function you are able to return multiple rows.
You can get your final result using an aggregate function with a CASE expression:
select
max(case when test = 1 then value end) test1Value,
max(case when test = 1 then shape end) test1Shape,
max(case when test = 2 then value end) test2Value,
max(case when test = 2 then shape end) test2Shape
from
(
select test, value, shape,
row_number() over(partition by test
order by value) seq
from yourtable
) d
group by seq;
See SQL Fiddle with Demo.
If you want to implement the PIVOT function, then I would first need to unpivot your multiple columns of Value and Shape and then apply the PIVOT. You will still use row_number() to generate a unique sequence that will be needed to return multiple rows. The basic syntax will be:
;with cte as
(
-- get unique sequence
select test, value, shape,
row_number() over(partition by test
order by value) seq
from yourtable
)
select test1Value, test1Shape,
test2Value, test2Shape
from
(
-- unpivot the multiple columns
select t.seq,
col = 'test'+cast(test as varchar(10))
+ col,
val
from cte t
cross apply
(
select 'value', value union all
select 'shape', cast(shape as varchar(10))
) c (col, val)
) d
pivot
(
max(val)
for col in (test1Value, test1Shape,
test2Value, test2Shape)
) piv;
See SQL Fiddle with Demo. Both versions give a result:
| TEST1VALUE | TEST1SHAPE | TEST2VALUE | TEST2SHAPE |
|------------|------------|------------|------------|
| 1,89 | 20 | 2,01 | 12 |
| 2,05 | 12 | 2,03 | 24 |
| 2,08 | 27 | 2,05 | 35 |

How to replace a functional (many) OUTER APPLY (SELECT * FROM)

Applies to Microsoft SQL Server 2008 R2.
The problem is
If we have a few dozen Outer Apply (30) then they begin to work pretty slowly. In the middle of the Outer Apply I have something more complicated than a simple select, a view.
Details
I'm writing a sort of attributes assigned to tables (in the database). Generally, a few tables, holds a reference to a table of attributes (key, value).
Pseudo structure looks like this:
DECLARE #Lot TABLE (
LotId INT PRIMARY KEY IDENTITY,
SomeText VARCHAR(8))
INSERT INTO #Lot
OUTPUT INSERTED.*
VALUES ('Hello'), ('World')
DECLARE #Attribute TABLE(
AttributeId INT PRIMARY KEY IDENTITY,
LotId INT,
Val VARCHAR(8),
Kind VARCHAR(8))
INSERT INTO #Attribute
OUTPUT INSERTED.* VALUES
(1, 'Foo1', 'Kind1'), (1, 'Foo2', 'Kind2'),
(2, 'Bar1', 'Kind1'), (2, 'Bar2', 'Kind2'), (2, 'Bar3', 'Kind3')
LotId SomeText
----------- --------
1 Hello
2 World
AttributeId LotId Val Kind
----------- ----------- -------- --------
1 1 Foo1 Kind1
2 1 Foo2 Kind2
3 2 Bar1 Kind1
4 2 Bar2 Kind2
5 2 Bar3 Kind3
I can now run a query such as:
SELECT
[l].[LotId]
, [SomeText]
, [Oa1].[AttributeId]
, [Oa1].[LotId]
, 'Kind1Val' = [Oa1].[Val]
, [Oa1].[Kind]
, [Oa2].[AttributeId]
, [Oa2].[LotId]
, 'Kind2Val' = [Oa2].[Val]
, [Oa2].[Kind]
, [Oa3].[AttributeId]
, [Oa3].[LotId]
, 'Kind3Val' = [Oa3].[Val]
, [Oa3].[Kind]
FROM #Lot AS l
OUTER APPLY(SELECT * FROM #Attribute AS la WHERE la.[LotId] = l.[LotId] AND la.[Kind] = 'Kind1') AS Oa1
OUTER APPLY(SELECT * FROM #Attribute AS la WHERE la.[LotId] = l.[LotId] AND la.[Kind] = 'Kind2') AS Oa2
OUTER APPLY(SELECT * FROM #Attribute AS la WHERE la.[LotId] = l.[LotId] AND la.[Kind] = 'Kind3') AS Oa3
LotId SomeText AttributeId LotId Kind1Val Kind AttributeId LotId Kind2Val Kind AttributeId LotId Kind3Val Kind
----------- -------- ----------- ----------- -------- -------- ----------- ----------- -------- -------- ----------- ----------- -------- --------
1 Hello 1 1 Foo1 Kind1 2 1 Foo2 Kind2 NULL NULL NULL NULL
2 World 3 2 Bar1 Kind1 4 2 Bar2 Kind2 5 2 Bar3 Kind3
The simple way to get the pivot table of attribute values ​​and results for Lot rows that do not have attribute such a Kind3.
I know Microsoft PIVOT and it is not simple and do not fits here.
Finally, what will be faster and will give the same results?
In order to get the result you can unpivot and then pivot the data.
There are two ways that you can perform this. First, you can use the UNPIVOT and the PIVOT function:
select *
from
(
select LotId,
SomeText,
col+'_'+CAST(rn as varchar(10)) col,
value
from
(
select l.LotId,
l.SomeText,
cast(a.AttributeId as varchar(8)) attributeid,
cast(a.LotId as varchar(8)) a_LotId,
a.Val,
a.Kind,
ROW_NUMBER() over(partition by l.lotid order by a.attributeid) rn
from #Lot l
left join #Attribute a
on l.LotId = a.LotId
) src
unpivot
(
value
for col in (attributeid, a_Lotid, val, kind)
) unpiv
) d
pivot
(
max(value)
for col in (attributeid_1, a_LotId_1, Val_1, Kind_1,
attributeid_2, a_LotId_2, Val_2, Kind_2,
attributeid_3, a_LotId_3, Val_3, Kind_3)
) piv
See SQL Fiddle with Demo.
Or starting in SQL Server 2008+, you can use CROSS APPLY with a VALUES clause to unpivot the data:
select *
from
(
select LotId,
SomeText,
col+'_'+CAST(rn as varchar(10)) col,
value
from
(
select l.LotId,
l.SomeText,
cast(a.AttributeId as varchar(8)) attributeid,
cast(a.LotId as varchar(8)) a_LotId,
a.Val,
a.Kind,
ROW_NUMBER() over(partition by l.lotid order by a.attributeid) rn
from #Lot l
left join #Attribute a
on l.LotId = a.LotId
) src
cross apply
(
values ('attributeid', attributeid),('LotId', a_LotId), ('Value', Val), ('Kind', Kind)
) c (col, value)
) d
pivot
(
max(value)
for col in (attributeid_1, LotId_1, Value_1, Kind_1,
attributeid_2, LotId_2, Value_2, Kind_2,
attributeid_3, LotId_3, Value_3, Kind_3)
) piv
See SQL Fiddle with Demo.
The unpivot process takes the multiple columns for each LotID and SomeText and converts it into rows giving the result:
| LOTID | SOMETEXT | COL | VALUE |
--------------------------------------------
| 1 | Hello | attributeid_1 | 1 |
| 1 | Hello | LotId_1 | 1 |
| 1 | Hello | Value_1 | Foo1 |
| 1 | Hello | Kind_1 | Kind1 |
| 1 | Hello | attributeid_2 | 2 |
I added a row_number() to the inner subquery to be used to create the new column names to pivot. Once the names are created the pivot can be applied to the new columns giving the final result
This could also be done using dynamic SQL:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(col+'_'+rn)
from
(
select
cast(ROW_NUMBER() over(partition by l.lotid order by a.attributeid) as varchar(10)) rn
from Lot l
left join Attribute a
on l.LotId = a.LotId
) t
cross apply (values ('attributeid', 1),
('LotId', 2),
('Value', 3),
('Kind', 4)) c (col, so)
group by col, rn, so
order by rn, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT LotId,
SomeText,' + #cols + '
from
(
select LotId,
SomeText,
col+''_''+CAST(rn as varchar(10)) col,
value
from
(
select l.LotId,
l.SomeText,
cast(a.AttributeId as varchar(8)) attributeid,
cast(a.LotId as varchar(8)) a_LotId,
a.Val,
a.Kind,
ROW_NUMBER() over(partition by l.lotid order by a.attributeid) rn
from Lot l
left join Attribute a
on l.LotId = a.LotId
) src
cross apply
(
values (''attributeid'', attributeid),(''LotId'', a_LotId), (''Value'', Val), (''Kind'', Kind)
) c (col, value)
) x
pivot
(
max(value)
for col in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo
All three versions will give the same result:
| LOTID | SOMETEXT | ATTRIBUTEID_1 | LOTID_1 | VALUE_1 | KIND_1 | ATTRIBUTEID_2 | LOTID_2 | VALUE_2 | KIND_2 | ATTRIBUTEID_3 | LOTID_3 | VALUE_3 | KIND_3 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Hello | 1 | 1 | Foo1 | Kind1 | 2 | 1 | Foo2 | Kind2 | (null) | (null) | (null) | (null) |
| 2 | World | 3 | 2 | Bar1 | Kind1 | 4 | 2 | Bar2 | Kind2 | 5 | 2 | Bar3 | Kind3 |

Resources