Shred JSON array into child tables using SQL Server functions - arrays

Here is JSON I would like to shred into three tables using SQL Server JSON functions:
{
"school" : "Ecole",
"classes": [
{
"className": "Math",
"Students": ["LaPlace", "Fourier","Euler","Pascal"]
}
{
"className": "Science",
"Students": ["Newton", "Einstein","Al-Biruni", "Cai"]
},
]
}
Table 1
+-------+--------+
| ID | school |
+-------+--------+
Table 2
+-------+---------------+-----------+
| ID | schoolID (FK) | className |
+-------+---------------+-----------+
Table 3
+-------+---------------+-----------+
| ID | classID (FK) | student |
+-------+---------------+-----------+
My queries so far:
SELECT * FROM OPENJSON(#json, '$.school') --Returns the name of the school
SELECT
ClassName = JSON_VALUE(c.value, '$.className'),
Students = JSON_QUERY(c.value, '$.Students')
FROM
OPENJSON(#json, '$.classes') c
-- Returns the name of the class and a JSON array of students.
I am wondering how use SQL to shred the JSON array to extract the data for the third table so that it looks like this:
Math class Id = 1
Science class Id =2
Id ClassId Student
+-------+--------+-----------+
| 1 | 1 | LaPlace |
+-------+--------+-----------+
| 2 | 1 | Fourier |
+-------+--------+-----------+
| 3 | 1 | Euler |
+-------+--------+-----------+
| 4 | 1 | Pascal |
+-------+--------+-----------+
| 5 | 2 | Newton |
+-------+--------+-----------+
| 6 | 2 | Einstein |
+-------+--------+-----------+
| 7 | 2 | Al-Biruni |
+-------+--------+-----------+
| 8 | 2 | Cai |
+-------+--------+-----------+
I can get the Ids from the other tables, but I don't know how to write a query to extract the students from the JSON arrays.
I do have the ability to restructure the JSON schema so that instead of arrays of strings, I could make arrays of objects:
"Students": [{"StudentName"}:"Newton", {"StudentName":"Einstein"},{"StudentName":"Al-Biruni"}, {"StudentName":"Cai"}]
But I am not certain that makes it any easier. Either way, I would still like to know how to write a query to accomplish the first case.

JSON is supported starting with SQL-Server 2016.
As your JSON is deeper nested (array of classes contains an array of students) I'd solve this with a combination of OPENJSON and a WITH-clause. Please look a bit closer to the AS JSON in the WITH-clause. This will allow for another CROSS APPLY OPENJSON(), hence moving deeper and deeper into your JSON-structure.
DECLARE #json NVARCHAR(MAX) =
N'{
"school" : "Ecole",
"classes": [
{
"className": "Math",
"Students": ["LaPlace", "Fourier","Euler","Pascal"]
},
{
"className": "Science",
"Students": ["Newton", "Einstein","Al-Biruni", "Cai"]
}
]
}';
--The query
SELECT ROW_NUMBER() OVER(ORDER BY B.className,C.[key]) AS RowId
,A.school
,B.className
,CASE B.className WHEN 'Math' THEN 1 WHEN 'Science' THEN 2 ELSE 0 END AS ClassId
,C.[key] AS StudentIndex
,C.[value] AS Student
FROM OPENJSON(#json)
WITH(school NVARCHAR(MAX)
,classes NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.classes)
WITH(className NVARCHAR(MAX)
,Students NVARCHAR(MAX) AS JSON) B
CROSS APPLY OPENJSON(B.Students) C
The result
+-------+--------+-----------+---------+--------------+-----------+
| RowId | school | className | ClassId | StudentIndex | Student |
+-------+--------+-----------+---------+--------------+-----------+
| 1 | Ecole | Math | 1 | 0 | LaPlace |
+-------+--------+-----------+---------+--------------+-----------+
| 2 | Ecole | Math | 1 | 1 | Fourier |
+-------+--------+-----------+---------+--------------+-----------+
| 3 | Ecole | Math | 1 | 2 | Euler |
+-------+--------+-----------+---------+--------------+-----------+
| 4 | Ecole | Math | 1 | 3 | Pascal |
+-------+--------+-----------+---------+--------------+-----------+
| 5 | Ecole | Science | 2 | 0 | Newton |
+-------+--------+-----------+---------+--------------+-----------+
| 6 | Ecole | Science | 2 | 1 | Einstein |
+-------+--------+-----------+---------+--------------+-----------+
| 7 | Ecole | Science | 2 | 2 | Al-Biruni |
+-------+--------+-----------+---------+--------------+-----------+
| 8 | Ecole | Science | 2 | 3 | Cai |
+-------+--------+-----------+---------+--------------+-----------+

Something like this:
declare #json nvarchar(max) = N'
{
"school" : "Ecole",
"classes": [
{
"className": "Math",
"Students": ["LaPlace", "Fourier","Euler","Pascal"]
},
{
"className": "Science",
"Students": ["Newton", "Einstein","Al-Biruni", "Cai"]
}
]
}
';
with q as
(
SELECT
ClassID = c.[key]+1,
ClassName = JSON_VALUE(c.value, '$.className'),
Id = row_number() over (order by c.[Key], students.[key] ),
Student = students.value
FROM
OPENJSON(#json, '$.classes') c
cross apply openjson(c.value,'$.Students') students
)
select Id, ClassId, Student
from q
/*
Id ClassId Student
----------- ----------- -----------
1 1 LaPlace
2 1 Fourier
3 1 Euler
4 1 Pascal
5 2 Newton
6 2 Einstein
7 2 Al-Biruni
8 2 Cai
*/

Related

MSSQL recursive query to find furthest parent

We have a db structure where each company has a parent company defined. What we want to do is walk up the structure from a given start point until the next 'most-parent' company is found and pull what that users assignment is to that 'most-parent' company. Below is a mock example.
+===================+ +=======+ +=============+
| Company | | User | | UserAccess |
+===================+ +=======+ +=============+
| id | | id | | id |
| Name | | Name | | fkUserId |
| fkParentCompanyId | +=======+ | fkCompanyId |
+===================+ | AccessLevel |
+=============+
+=======+
|Company|
+=============================================+
| id | Name | fkParentCompanyId |
+=============================================+
| 1 | ABC Corp | 1 |
| 2 | Outside Company | 1 |
| 3 | Inside Company | 1 |
| 4 | My Company | 3 |
| 5 | Other LLC | 4 |
| 6 | Yet Another Comp | 5 |
+=============================================+
+====+
|User|
+======================+
| id | Name |
+======================+
| 1 | Mike |
| 2 | Jackie |
| 3 | Sam |
+======================+
+==========+
|UserAccess|
+=================================================+
| id | fkUserId | fkCompanyId | AccessLevel |
+=================================================+
| 1 | 1 | 1 | Administrator |
| 2 | 2 | 1 | User |
| 3 | 3 | 1 | Administrator |
| 4 | 3 | 3 | Parent |
| 5 | 3 | 4 | Parent |
| 6 | 3 | 5 | Parent |
| 7 | 3 | 6 | Parent |
+=================================================+
So take 'Same', user.id == 3. I want to do a query that finds the nearest "not-Parent" relationship when given a starting Company.id along with Sam's User.id. Next is what I'm looking to get from the given inputs.
Inputs: User.id = 3, Company.id = 6
Output: Administrator
Inputs: User.id = 3, Company.id = 5
Output: Administrator
Inputs: User.id = 3, Company.id = 4
Output: Administrator
Inputs: User.id = 2, Company.id = 1
Output: User
I've been looking at recursive queries using the CTE model, having some difficulty understanding what's it's really doing and thus not yet able to translate that model into something that would work for the above example.
Any guidance would be greatly appreciated.
Update
Been working on CTE in SSMS and feel like I'm getting close... Here is an example query that I'm trying, but not getting the result I expect...
Here I'm trying to do a recursive CTE on Sam # "Yet Another Comp". What I want is a list of Sams access up the chain.
with cte(UserId, CompanyId, UserAccess, ParentCompanyId)
as
(
select
[UserId],
[CompanyId],
[UserAccess],
[ParentCompanyId]
from [UserAccess]
where [UserId] = 3 and [CompanyId] = 6
union all
select
[UserAccess].[UserId],
[UserAccess].[CompanyId],
[UserAccess].[AccessLevel],
[cte].[ParentCompanyId]
from [UserAccess]
join [cte] on [UserAccess].[ParentCompanyId] = [cte].[CompanyId]
where [UserAccess].[UserId] = 3 [UserAccess].[CompanyId] != 6
)
select * from cte
I'm expecting this:
+===+
|cte|
+==========================================================+
| UserId | CompanyId | UserAccess | ParentCompanyId |
+==========================================================+
| 3 | 6 | Parent | 5 |
| 3 | 5 | Parent | 4 |
| 3 | 4 | Parent | 3 |
| 3 | 3 | Parent | 1 |
| 3 | 1 | Administrator | 1 |
+==========================================================+
But what I'm actually getting is this:
+===+
|cte|
+==========================================================+
| UserId | CompanyId | UserAccess | ParentCompanyId |
+==========================================================+
| 3 | 6 | Parent | 5 |
+==========================================================+
The recursive query isn't pulling additional rows into the table. So I commented out the where statement on the recursive query, expecting to get a max recursion error and see a blip of all of the results. But no, still just get the single row back.

How to convert multiple columns value as header for set of values in SQL?

I have the following table:(SQL Server)
tableA
+-----------+--------+--------------+----------+
| tableA_id | code | Department | Column1 |
+-----------+--------+--------------+----------+
| 1 | code A | Science 1 | NULL |
| 2 | code B | Science 1 | Test |
| 3 | code A | Science 2 | Null |
| 4 | code C | Science 1 | Test1 |
| 5 | code B | Science 2 | Test |
| 6 | code A | Science 3 | NULL |
| 7 | code C | Science 2 | Test1 |
| 8 | code B | Science 3 | Test |
| 9 | code A | Science 4 | NULL |
| 10 | code C | Science 3 | Test1 |
| 11 | code B | Science 4 | Test |
+-----------+--------+--------------+----------+
I want to convert in below format -
+--------------+
| Department |
+--------------+
| Code A NULL |
+--------------+
| Science 1 |
| Science 2 |
| Science 3 |
| Science 4 |
+--------------+
| Code B Test |
+--------------+
| Science 1 |
| Science 2 |
| Science 3 |
| Science 4 |
+--------------+
| Code C Test1|
+--------------+
| Science 1 |
| Science 2 |
| Science 3 |
+--------------+
Basically i want it group by Code and Column1 but need to display Code and Column1 at top of each group and values in column not fixed it comes dynamically.
I used below query for Code column as below -
select coalesce(A.department, A.code) as 'Department'
from TableA A
group by A.code, A.department with rollup
having grouping(A.code) = 0
order by A.code, A.department;
But when i trying same query for Code and Column1, it is not giving expected result -
select coalesce(A.department, A.code,A.Column1) as 'Department'
from TableA A
group by A.code, A.department,A.Column1 with rollup
having grouping(A.code) = 0 and grouping(A.Column1)=0
order by A.code, A.department,A.Column1;
Here is one approach which brings in the sub headers via a union query:
WITH cte AS (
SELECT DISTINCT
code, Column1, code + ' ' + COALESCE(Column1, 'NULL') AS Department, 1 AS priority
FROM tableA
UNION ALL
SELECT code, Column1, Department, 2 FROM tableA
)
SELECT Department
FROM cte
ORDER BY code, Column1, Department, priority;
Demo

SQL-Server Closure table query

I need a hierarchy for my database and decided to use the closure table model. The hierarchy tables have the usual structure, like this:
locations table
+----+---------+
| id | name |
+----+---------+
| 1 | Europe |
| 2 | France |
| 3 | Germany |
| 4 | Spain |
| 5 | Paris |
| 6 | Nizza |
| 7 | Berlin |
| 8 | Munich |
| 9 | Madrid |
+----+---------+
CREATE TABLE locations (
id int IDENTITY(1,1) PRIMARY KEY,
name varchar(30)
)
lacations_relation table
+----+--------+--------+-------+
| id | src_id | dst_id | depth |
+----+--------+--------+-------+
| 1 | 1 | 1 | 0 |
| 2 | 2 | 2 | 0 |
| 3 | 1 | 2 | 1 |
| 4 | 3 | 3 | 0 |
| 5 | 1 | 3 | 1 |
| 6 | 4 | 4 | 0 |
| 7 | 1 | 4 | 1 |
| 8 | 5 | 5 | 0 |
| 9 | 2 | 5 | 1 |
| 10 | 1 | 5 | 2 |
| 11 | 6 | 6 | 0 |
| 12 | 2 | 6 | 1 |
| 13 | 1 | 6 | 2 |
| 14 | 7 | 7 | 0 |
| 15 | 3 | 7 | 1 |
| 16 | 1 | 7 | 2 |
| 17 | 8 | 8 | 0 |
| 18 | 3 | 8 | 1 |
| 19 | 1 | 8 | 2 |
| 20 | 9 | 9 | 0 |
| 21 | 4 | 9 | 1 |
| 22 | 1 | 9 | 2 |
+----+--------+--------+-------+
CREATE TABLE locations_relation (
id int IDENTITY(1,1) PRIMARY KEY,
src_id int,
dst_id int,
depth int,
CONSTRAINT FK_src FOREIGN KEY (src_id)
REFERENCES locations (id),
CONSTRAINT FK_dst FOREIGN KEY (dst_id)
REFERENCES locations (id)
)
Now there is a third table, which holds information about documents and is referencing the locations table, which looks like this:
closure_junction
+----+------------+-------------+
| id | country_id | document_id |
+----+------------+-------------+
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 6 | 2 |
| 4 | 6 | 3 |
| 5 | 5 | 2 |
| 6 | 5 | 4 |
+----+------------+-------------+
CREATE TABLE closure_junction (
id int IDENTITY(1,1) PRIMARY KEY,
country_id int NOT NULL,
document_id int,
CONSTRAINT FK_countries FOREIGN KEY (id)
REFERENCES countries(id)
)
What I'd like to have is single SQL-Query which counts the document per location and if there are documents in a child it should be counted up in the parent. For example if paris holds 2 documents than france should automatically also hold 2 documents. The query should also output the path of each node to the root aswell as the depth of the node. I know there is way to do this recursively, but I'd like to avoid that.
I have a query which gives me the correct result, but I'm not satisfied with how it works. Is there a way to circumentvent storing the children in a column?
This is my query with the correct output:
;WITH cte (name, path, depth, children) AS
(
SELECT
node.name,
STRING_AGG(locations.name, ' / ' ) WITHIN GROUP (ORDER BY relation.depth DESC) as path,
MAX(relation.depth) as depth,
STRING_AGG(locations.id, ' ') as children
FROM locations node
INNER JOIN locations_relation relation
ON node.id = relation.dst_id
INNER JOIN locations
ON relation.src_id = locations.id
GROUP BY node.name
)
SELECT
name,
path,
depth,
COUNT(DISTINCT document_id) as count_docs
FROM cte
CROSS APPLY string_split(children, ' ')
LEFT JOIN closure_junction ON
closure_junction.country_id = value
GROUP BY name, path, depth
ORDER BY depth ASC
+---------+---------------------------+-------+------------+
| name | path | depth | count_docs |
+---------+---------------------------+-------+------------+
| Europe | Europe | 0 | 0 |
| France | Europe / France | 1 | 2 |
| Germany | Europe / Germany | 1 | 0 |
| Spain | Europe / Spain | 1 | 0 |
| Berlin | Europe / Germany / Berlin | 2 | 0 |
| Madrid | Europe / Spain / Madrid | 2 | 0 |
| Munich | Europe / Germany / Munich | 2 | 0 |
| Nizza | Europe / France / Nizza | 2 | 3 |
| Paris | Europe / France / Paris | 2 | 3 |
+---------+---------------------------+-------+------------+
Would be great if someone could give me a clue on how to accomplish this.
The count you can easily replace with a simple LEFT JOIN, but for this path you will still need to concatenate it somehow.
Something like this:
WITH CTE_path
AS
( SELECT node.id,
STRING_AGG(locations.name, ' / ' ) WITHIN GROUP (ORDER BY relation.depth DESC) as path
FROM locations node
INNER JOIN locations_relation relation
ON node.id = relation.dst_id
INNER JOIN locations
ON relation.src_id = locations.id
GROUP BY node.id)
SELECT l.name,count(DISTINCT cj.document_id),pa.path
FROM locations l
JOIN CTE_path pa
ON pa.id = l.id
LEFT JOIN locations_relation lr
ON l.id = lr.dst_id
LEFT JOIN closure_junction cj
ON cj.country_id = lr.src_id
GROUP BY l.name,pa.path

Sorting Table in hierarchical order

Is it possible to sorting queries table in hierarchical order like this:
Expected
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| ID | Code | Name | Qty | Amount | is_parent | parent_id | remarks |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 1 | ABC | Parent1 | 2 | 1,000 | 1 | 0 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 4 | FFLK | Product Z | 10 | 2,500 | 0 | 1 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 5 | P6DT | Product 5 | 7 | 1,700 | 0 | 1 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 6 | P2GL | Product T | 5 | 1,100 | 0 | 1 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 2 | DHG | Parent2 | 5 | 1,500 | 1 | 0 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 3 | LMSJ | Product U | 4 | 600 | 0 | 2 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
This is the original data table:
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| ID | Code | Name | Qty | Amount | is_parent | parent_id | remarks |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 1 | ABC | Parent1 | 2 | 1,000 | 1 | 0 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 2 | DHG | Parent2 | 5 | 1,500 | 1 | 0 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 3 | LMSJ | Product U | 4 | 600 | 0 | 2 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 4 | FFLK | Product Z | 10 | 2,500 | 0 | 1 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 5 | P6DT | Product 5 | 7 | 1,700 | 0 | 1 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
| 6 | P2GL | Product T | 5 | 1,100 | 0 | 1 | xxx |
+----+--------+-----------+-------+--------+-----------+-----------+---------+
is_parent column = 1 if data row set to parent, 0 if data row set to child
parent_id column = 0 if data row set to parent, depend on ID of parent data
I'm using SQL Server to generate the data.
It looks like the actual question is how to query the data in hierarchical order. This is possible using recursive queries but a faster alternative is to use SQL Server's support for hierarchical data.
A recursive query that returns the data in hierarchical order would look like this :
WITH h AS
(
SELECT
ID,Code,Name,Qty,Amount,is_parent,parent_id,remarks
FROM
dbo.ThatTable
WHERE
parent_id=0
UNION ALL
SELECT
c.ID,c.Code,c.Name,c.Qty,c.Amount,c.is_parent,c.parent_id,c.remarks
FROM
dbo.ThatTable c
INNER JOIN h ON
c.parent_id= h.Id
)
SELECT * FROM h
This query's performance will be acceptable if the ID and Parent_ID fields are indexed, but not great.
Adding a hierarchyid field to the table would make the query simpler and far faster. Assuming there's a hierarchy field, the query would be just :
SELECT *
FROM ThatTable
ORDER BY hierarchy
Adding an index on hierarchy will this query and any query that looks eg for children of a specific node, very fast. Instead of querying recursively, the server only needs to look into that single index.
The article Lesson 1: Converting a Table to a Hierarchical Structure shows how to create a new table with a hierarchyid and populate it from parent/child data.

How to merge two row and sum some columns in an UPDATE?

I work with SQL Server and I need to merge multiples rows only if they have the same value between them in two specific columns (Col_01 and Col_02 in my example).
When I merge them I have to sum some columns (Col_03 and Col_04 in my example).
I think an example will be the more explicit. Here is my table simplified :
| ID | Col_01 | Col_02 | Col_03 | Col_04 |
| 1 | ABC | DEF | 2 | 2 |
| 2 | ABC | DEF | 1 | 0 |
| 3 | DEF | GHI | 0 | 2 |
| 4 | ABC | GHI | 1 | 0 |
| 5 | JKL | GHI | 0 | 2 |
And here is what I want after my update :
| ID | Col_01 | Col_02 | Col_03 | Col_04 |
| 2 | ABC | DEF | 3 | 2 |
| 3 | DEF | GHI | 0 | 2 |
| 4 | ABC | GHI | 1 | 0 |
| 5 | JKL | GHI | 0 | 2 |
I merged ID 1 and ID 2 because they had the same Col_01 and the same Col_02.
I tried a query like that
SELECT MAX(ID), Col_01, Col_02, SUM(Col_03), SUM(Col_04)
FROM Table
GROUP BY Col_01, Col_02
I've got what the rows merged but I loose the not merged ones.
I don't know how to properly use it in an UPDATE query in order to merge the rows with the same (Col_01, Col_02) and keep the others. Can you help me to do this ?
So you actually need to delete some rows from original table. Use MERGE statement for that:
MERGE Table tgt
USING (SELECT MAX(ID) as ID, Col_01, Col_02, SUM(Col_03) AS Col_03, SUM(Col_04) AS Col_04
FROM Table
GROUP BY Col_01, Col_02) src ON tgt.ID = srs.ID
WHEN MATCHED THEN
UPDATE
SET Col_03 = src.Col_03, Col_04 = src.Col_04
WHEN NOT MATCHED BY SOURCE THEN
DELETE
Try this:
UPDATE T SET
T.Col_03 = G.Col_03,
T.Col_04 = G.Col_04
FROM Table AS T INNER JOIN
(SELECT MAX(ID) AS ID, Col_01, Col_02, SUM(Col_03) AS Col_03, SUM(Col_04) AS Col_04
FROM Table
GROUP BY Col_01, Col_02) AS G
ON G.ID = T.ID

Resources