How to pivot columns from EAV by group (SQL Server) - sql-server

I have an EAV table in SQL Server and would like to pivot some of the table into columns.
EXAMPLE DATA
| id | key | value |
|-----|---------------|-----------------|
| 001 | Name | Hand Protection |
| 001 | Family | Gloves |
| 001 | Brand | Memphis |
| 001 | Style | 6030 |
| 001 | Material | Nitrile |
| 002 | Name | Hand Protection |
| 002 | Family | Gloves |
| 002 | Brand | Gladiator |
| 002 | Color | Black |
| 002 | Size | Large |
| 003 | Name | Head Protection |
| 003 | Family | Hats |
| 003 | Brand | Gladiator |
| 003 | Color | Black |
| 003 | Size | Large |
For each name I would like to aggregate
DESIRED OUTPUT
| name | list |
|-----------------|---------------------|
| Hand Protection | [{"attribute": "Family", "values": ["Gloves"], "count": 2},{"attribute": "Brand", "values": ["Memphis", "Gladiator"], "count": 2},{"attribute": "Style", "values": [6030], "count": 1},{"attribute": "Material", "values": ["Nitrile"], "count": 1},{"attribute": "Color", "values": ["Black"], "count": 1},{"attribute": "Size", "values": ["Large"], "count": 1} |
| Head Protection | [{"attribute": "Family", "values": ["Hats"], "count": 1},{"attribute": "Brand", "values": ["Gladiator"], "count": 1},{"attribute": "Color", "values": ["Black"], "count": 1},{"attribute": "Size", "values": ["Large"], "count": 1}
SQL
-- Get reference item for attributes
select distinct
[id]
into #tmp
from data..item_attributes
where [key] = 'name'
and [value] = #category
-- Get attributes for a given name
select distinct
'attribute' = [key],
'values' = (
select distinct
'value' = [value]
from data..item_attributes
where [key] = #attribute
for json path
),
'count' = count(*)
from data..item_attributes
where id in (select * from #tmp)
-- Make new table
update table set [list] = (
**???***
)
where [name] = [key]
How can I pivot out associated keys and append them into a new column with a count?

db<>fiddle
SELECT MAX(CASE YT.[key] WHEN 'Name' THEN YT.[Value] END) AS [name]
, ( SELECT [key] AS attribute
, JSON_QUERY(REPLACE(REPLACE(
( SELECT value
FROM dbo.YourTable AT
WHERE AT.id = YT.id AND AT.[key] = ZT.[key]
FOR JSON PATH), '{"value":', ''), '"}', '"')) AS [values]
, COUNT(*) AS [count]
FROM dbo.YourTable ZT
WHERE ZT.id = YT.id AND Zt.[key] <> 'Name'
GROUP BY id, [key]
FOR JSON PATH) AS list
FROM dbo.YourTable YT
GROUP BY YT.id;
Returns something like this
name list
HandProtection [{"attribute":"Brand","values":["Memphis"],"count":1},{"attribute":"Family","values":["Gloves"],"count":1},{"attribute":"Material","values":["Nitrile"],"count":1},{"attribute":"Style","values":["6030"],"count":1}]
HandProtection [{"attribute":"Brand","values":["Gladiator"],"count":1},{"attribute":"Color","values":["Black","White"],"count":2},{"attribute":"Family","values":["Gloves"],"count":1},{"attribute":"Size","values":["Large"],"count":1}]
HeadProtection [{"attribute":"Brand","values":["Gladiator"],"count":1},{"attribute":"Color","values":["Black"],"count":1},{"attribute":"Family","values":["Hats"],"count":1},{"attribute":"Size","values":["Large"],"count":1}]
This one turned out harder than it should have been. For instance had to use the JSON_QUERY(REPLACE(REPLACE kludge to produce a JSON array of strings (from the JSON array of objects FOR JSON PATH insists on doing).

Related

How to identify valid records based on column values in snowflake

I have a table as below
I want output like below
This means I have few predefined pairs, example
if one employee is coming from both HR_INTERNAL and HR_EXTERNAL, take only that record which is from HR_INTERNAL
if one employee is coming from both SALES_INTERNAL and SALES_EXTERNAL, take only that record which is from SALES_INTERNAL
etc.
Is there a way to achieve this?
I used ROW_NUMBER to rank
ROW_NUMBER() OVER(PARTITION BY "EMPID" ORDER BY SOURCESYSTEM ASC) AS RANK_GID
I just put them on a table like this:
create or replace table predefined_pairs ( pairs ARRAY );
insert into predefined_pairs select [ 'HR_INTERNAL', 'HR_EXTERNAL' ] ;
insert into predefined_pairs select [ 'SALES_INTERNAL', 'SALES_EXTERNAL' ] ;
Then I use the following query to produce the output you wanted:
select s.sourcesystem, s.empid,
CASE WHEN COUNT(1) OVER(PARTITION BY EMPID) = 1 THEN 'ValidRecord'
WHEN p.pairs[0] IS NULL THEN 'ValidRecord'
WHEN p.pairs[0] = s.sourcesystem THEN 'ValidRecord'
ELSE 'InvalidRecord'
END RecordValidity
from source s
left join predefined_pairs p on array_contains( s.sourcesystem::VARIANT, p.pairs ) ;
+-------------------+--------+----------------+
| SOURCESYSTEM | EMPID | RECORDVALIDITY |
+-------------------+--------+----------------+
| HR_INTERNAL | EMP001 | ValidRecord |
| HR_EXTERNAL | EMP001 | InvalidRecord |
| SALES_INTERNAL | EMP002 | ValidRecord |
| SALES_EXTERNAL | EMP002 | InvalidRecord |
| HR_EXTERNAL | EMP004 | ValidRecord |
| SALES_INTERNAL | EMP005 | ValidRecord |
| PURCHASE_INTERNAL | EMP003 | ValidRecord |
+-------------------+--------+----------------+

Snowflake - Infer Schema from JSON data in Variant Column Dynamically

Experts,
We have a scenario to infer schema from JSON data loaded in the table. It has to be done dynamically and also the JSON data in table would be of different schema.
Example:
row 1-> address <array>[ id string ,name string ]
row 2-> address<array> [addr<object> {id:"1",name:"abc"}]
row 3-> address<array> [addr<object> {id:"2",name:"dfg",Zips<array>[zip1:6009,zip2:789]}]
I am aware we can use LATERAL FLATTEN & recursive to infer the schema. However when we need to shred Zips the above data we need have flatten query as below.
LATERAL FLATTEN (jsondata:address ,recursive =>true) a
LATERAL FLATTEN (a.value:addr,recursive =>true) b -> this is causing issue
LATERAL FLATTEN (c.value:Zips,recursive =>true) c
When we flatten object data type it is flattening to element level, is there a way to check and dynamically avoid flattening of object.
Regards,
Gopi
Snowflake' semi-structured data query features provide data-type inspection functions that can be used to conditionally handle such a varied input within a single table column.
In particular, after breaking down the outer arrays into whole rows, you can use IS_ARRAY, IS_OBJECT, and the : operator (with NULL result checks) functions to separate the record producing logic, and then combine the rows into a single output with a UNION ALL.
The question lacks a clear/usable sample or schema of data (and an expected output) so I've made four assumptions below on what the data looks like and added a filter for each type from the root. The general idea remains the same for each (check type, divide dataset, process each type), you should be able to infer and adjust.
WITH tbl AS (
-- Sample table data
select parse_json('{"address": [["sa_id1", "sa_name1"], ["sa_id2", "sa_name2"]], "other_outer_field": 1}') jsondata
union all
select parse_json('{"address": [{"id": "sr_id1", "name": "sr_name1"}, {"id": "sr_id2", "name": "sr_name2"}], "other_outer_field": 2}') jsondata
union all
select parse_json('{"address": [{"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, {"id": "zr_id2", "name": "zr_name2", "zips": ["20001", "20002"]}], "other_outer_field": 3}') jsondata
union all
select parse_json('{"address": {"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, "other_outer_field": 4}') jsondata
), all_address_array_formats AS (
-- Table's actual row: { …, "address": [ … ], … } when the address field is an array
SELECT
jsondata:other_outer_field AS o_f,
each_address.value AS address_container
FROM tbl, LATERAL FLATTEN(jsondata:address) each_address
WHERE IS_ARRAY(jsondata:address)
), all_address_object_formats AS (
-- Table's actual row: { …, "address": { … }, … } when the address field is an object (we do not need to flatten here)
SELECT
jsondata:other_outer_field AS o_f,
jsondata:address AS address_container
FROM tbl
WHERE IS_OBJECT(jsondata:address)
), just_array_members AS (
-- For address array with nested arrays: [ [id1, name1], [id2, name2], … ]
SELECT
o_f,
address_container[0]::varchar AS id,
address_container[1]::varchar AS name,
NULL AS zipcode
FROM all_address_array_formats
WHERE
IS_ARRAY(address_container)
), simple_record_members AS (
-- For address array with objects, but no zipcode fields: [ { id: id1, name: name1 }, { id: id2, name: name2 }, … ]
SELECT
o_f,
address_container:id::varchar AS id,
address_container:name::varchar AS name,
NULL AS zipcode
FROM all_address_array_formats
WHERE
IS_OBJECT(address_container)
AND address_container:zips IS NULL
), zipcode_record_members AS (
-- For address array with objects, each with multiple zipcodes: [ { id: id1, name: name1, zips: [ zip1_1, zip1_2, … ] }, { id: id2, name: name2, zips: [zip2_1, zip2_2, …] }, … ]
SELECT
o_f,
address_container:id::varchar AS id,
address_container:name::varchar AS name,
per_zip.value::varchar AS zipcode
FROM all_address_array_formats, LATERAL FLATTEN(address_container:zips) per_zip
WHERE
IS_OBJECT(address_container)
AND address_container:zips IS NOT NULL
), zipcodes_within_object AS (
-- For address of object type, a single one with multiple zipcodes: { id: id1, name: name1, zips: [ zip1_1, zip1_2, … ] }
SELECT
o_f,
address_container:id::varchar AS id,
address_container:name::varchar AS name,
per_zip.value::varchar AS zipcode
FROM all_address_object_formats, LATERAL FLATTEN(address_container:zips) per_zip
WHERE
IS_OBJECT(address_container)
AND address_container:zips IS NOT NULL
)
SELECT o_f, id, name, zipcode FROM just_array_members UNION ALL
SELECT o_f, id, name, zipcode FROM simple_record_members UNION ALL
SELECT o_f, id, name, zipcode FROM zipcode_record_members UNION ALL
SELECT o_f, id, name, zipcode FROM zipcodes_within_object;
Note: The example also shows how to continue to carry any other fields apart from address from the original object in the table (column: o_f) that are not broken down via the flatten function.
For the input:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSONDATA |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {"address": [["sa_id1", "sa_name1"], ["sa_id2", "sa_name2"]], "other_outer_field": 1} |
| {"address": [{"id": "sr_id1", "name": "sr_name1"}, {"id": "sr_id2", "name": "sr_name2"}], "other_outer_field": 2} |
| {"address": [{"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, {"id": "zr_id2", "name": "zr_name2", "zips": ["20001", "20002"]}], "other_outer_field": 3} |
| {"address": {"id": "zr_id1", "name": "zr_name1", "zips": ["10001", "10002", "10003"]}, "other_outer_field": 4} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
This produces:
+-----+--------+----------+---------+
| O_F | ID | NAME | ZIPCODE |
|-----+--------+----------+---------|
| 1 | sa_id1 | sa_name1 | NULL |
| 1 | sa_id2 | sa_name2 | NULL |
| 2 | sr_id1 | sr_name1 | NULL |
| 2 | sr_id2 | sr_name2 | NULL |
| 3 | zr_id1 | zr_name1 | 10001 |
| 3 | zr_id1 | zr_name1 | 10002 |
| 3 | zr_id1 | zr_name1 | 10003 |
| 3 | zr_id2 | zr_name2 | 20001 |
| 3 | zr_id2 | zr_name2 | 20002 |
| 4 | zr_id1 | zr_name1 | 10001 |
| 4 | zr_id1 | zr_name1 | 10002 |
| 4 | zr_id1 | zr_name1 | 10003 |
+-----+--------+----------+---------+

Extract into multiple columns from JSON with PostgreSQL

I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365

How to get counts of CSV values in SQL Server column

I have a table that contains CSV strings for some of the values.
I'd like to get a count of each time an entry in the CSV exists.
However, the count is comparing strings instead of substrings.
Sample Data
| Category | Items |
|----------|---------------------------------|
| Basket 1 | Apples, Bananas, Oranges, Plums |
| Basket 2 | Oranges |
| Basket 3 | Oranges, Plums |
| Basket 4 | Apples, Bananas, Oranges, Plums |
Sample Select
select distinct
[key] = 'Items',
[value] = [items],
[count] = count([items])
from someTable
group by [items]
Current Output
| key | value | count |
|----------|---------------------------------|-------|
| Items | Apples, Bananas, Oranges, Plums | 2 |
| Items | Oranges | 1 |
| Items | Oranges, Plums | 1 |
Expected Output
| key | value | count |
|-------|---------|-------|
| Items | Apples | 2 |
| Items | Bananas | 2 |
| Items | Oranges | 4 |
| Items | Plums | 3 |
How can I get the count for each CSV entry in a column?
You want to use the STRING_SPLIT table-valued function to turn the comma-separated values into rows and then count them. You have to remove the spaces because STRING_SPLIT only accepts a singular separator character.
create table data
(
Category varchar(25)
, Items varchar(100)
)
insert into data
values
('Basket 1' ,'Apples, Bananas, Oranges, Plums')
, ('Basket 2', 'Oranges')
, ('Basket 3', 'Oranges, Plums')
, ('Basket 4', 'Apples, Bananas, Oranges, Plums')
select
'Items' as [key]
, value
, count(*) as [count]
from data
cross apply string_split(replace(Items, ' ', ''), ',')
group by value
Here is the demo.

Shred JSON array into child tables using SQL Server functions

Here is JSON I would like to shred into three tables using SQL Server JSON functions:
{
"school" : "Ecole",
"classes": [
{
"className": "Math",
"Students": ["LaPlace", "Fourier","Euler","Pascal"]
}
{
"className": "Science",
"Students": ["Newton", "Einstein","Al-Biruni", "Cai"]
},
]
}
Table 1
+-------+--------+
| ID | school |
+-------+--------+
Table 2
+-------+---------------+-----------+
| ID | schoolID (FK) | className |
+-------+---------------+-----------+
Table 3
+-------+---------------+-----------+
| ID | classID (FK) | student |
+-------+---------------+-----------+
My queries so far:
SELECT * FROM OPENJSON(#json, '$.school') --Returns the name of the school
SELECT
ClassName = JSON_VALUE(c.value, '$.className'),
Students = JSON_QUERY(c.value, '$.Students')
FROM
OPENJSON(#json, '$.classes') c
-- Returns the name of the class and a JSON array of students.
I am wondering how use SQL to shred the JSON array to extract the data for the third table so that it looks like this:
Math class Id = 1
Science class Id =2
Id ClassId Student
+-------+--------+-----------+
| 1 | 1 | LaPlace |
+-------+--------+-----------+
| 2 | 1 | Fourier |
+-------+--------+-----------+
| 3 | 1 | Euler |
+-------+--------+-----------+
| 4 | 1 | Pascal |
+-------+--------+-----------+
| 5 | 2 | Newton |
+-------+--------+-----------+
| 6 | 2 | Einstein |
+-------+--------+-----------+
| 7 | 2 | Al-Biruni |
+-------+--------+-----------+
| 8 | 2 | Cai |
+-------+--------+-----------+
I can get the Ids from the other tables, but I don't know how to write a query to extract the students from the JSON arrays.
I do have the ability to restructure the JSON schema so that instead of arrays of strings, I could make arrays of objects:
"Students": [{"StudentName"}:"Newton", {"StudentName":"Einstein"},{"StudentName":"Al-Biruni"}, {"StudentName":"Cai"}]
But I am not certain that makes it any easier. Either way, I would still like to know how to write a query to accomplish the first case.
JSON is supported starting with SQL-Server 2016.
As your JSON is deeper nested (array of classes contains an array of students) I'd solve this with a combination of OPENJSON and a WITH-clause. Please look a bit closer to the AS JSON in the WITH-clause. This will allow for another CROSS APPLY OPENJSON(), hence moving deeper and deeper into your JSON-structure.
DECLARE #json NVARCHAR(MAX) =
N'{
"school" : "Ecole",
"classes": [
{
"className": "Math",
"Students": ["LaPlace", "Fourier","Euler","Pascal"]
},
{
"className": "Science",
"Students": ["Newton", "Einstein","Al-Biruni", "Cai"]
}
]
}';
--The query
SELECT ROW_NUMBER() OVER(ORDER BY B.className,C.[key]) AS RowId
,A.school
,B.className
,CASE B.className WHEN 'Math' THEN 1 WHEN 'Science' THEN 2 ELSE 0 END AS ClassId
,C.[key] AS StudentIndex
,C.[value] AS Student
FROM OPENJSON(#json)
WITH(school NVARCHAR(MAX)
,classes NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.classes)
WITH(className NVARCHAR(MAX)
,Students NVARCHAR(MAX) AS JSON) B
CROSS APPLY OPENJSON(B.Students) C
The result
+-------+--------+-----------+---------+--------------+-----------+
| RowId | school | className | ClassId | StudentIndex | Student |
+-------+--------+-----------+---------+--------------+-----------+
| 1 | Ecole | Math | 1 | 0 | LaPlace |
+-------+--------+-----------+---------+--------------+-----------+
| 2 | Ecole | Math | 1 | 1 | Fourier |
+-------+--------+-----------+---------+--------------+-----------+
| 3 | Ecole | Math | 1 | 2 | Euler |
+-------+--------+-----------+---------+--------------+-----------+
| 4 | Ecole | Math | 1 | 3 | Pascal |
+-------+--------+-----------+---------+--------------+-----------+
| 5 | Ecole | Science | 2 | 0 | Newton |
+-------+--------+-----------+---------+--------------+-----------+
| 6 | Ecole | Science | 2 | 1 | Einstein |
+-------+--------+-----------+---------+--------------+-----------+
| 7 | Ecole | Science | 2 | 2 | Al-Biruni |
+-------+--------+-----------+---------+--------------+-----------+
| 8 | Ecole | Science | 2 | 3 | Cai |
+-------+--------+-----------+---------+--------------+-----------+
Something like this:
declare #json nvarchar(max) = N'
{
"school" : "Ecole",
"classes": [
{
"className": "Math",
"Students": ["LaPlace", "Fourier","Euler","Pascal"]
},
{
"className": "Science",
"Students": ["Newton", "Einstein","Al-Biruni", "Cai"]
}
]
}
';
with q as
(
SELECT
ClassID = c.[key]+1,
ClassName = JSON_VALUE(c.value, '$.className'),
Id = row_number() over (order by c.[Key], students.[key] ),
Student = students.value
FROM
OPENJSON(#json, '$.classes') c
cross apply openjson(c.value,'$.Students') students
)
select Id, ClassId, Student
from q
/*
Id ClassId Student
----------- ----------- -----------
1 1 LaPlace
2 1 Fourier
3 1 Euler
4 1 Pascal
5 2 Newton
6 2 Einstein
7 2 Al-Biruni
8 2 Cai
*/

Resources