SQL Server reduce recurring XML nodes to JSON array - sql-server

I have some XML in which every entry can contain some recurring elements. I'm trying to query it with OpenXML function and I want to reduce those elements to JSON arrays.
My SQL looks like this:
declare #idoc int,
#xml xml = '
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
exec sp_xml_preparedocument #idoc out, #xml;
select
id as ID
, address as AddressesJson
, document as DocumentsJson
from openxml(#idoc, '//individual', 2) with (
id int
, address nvarchar(max)
, document nvarchar(max)
);
exec sp_xml_removedocument #idoc;
The rusult I'm getting is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |Country1ZIP1City1 |1012020-01-01desc1 |
|2 |Country3ZIP3City3 |1032020-01-03desc3 |
What I would like to get is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |[{"coutry":"Country1","zip":"ZIP1","city":"City1"},{"coutry":"Country2","zip":"ZIP2","city":"City2"}] |[{"num":"101","issued":"2020-01-01","description":"desc1"},{"num":"102","issued":"2020-01-02","description":"desc2"}] |
|2 |[{"coutry":"Country3","zip":"ZIP3","city":"City3"},{"coutry":"Country4","zip":"ZIP4","city":"City4"}] |[{"num":"103","issued":"2020-01-03","description":"desc3"},{"num":"104","issued":"2020-01-04","description":"desc4"}] |
How can I achieve this?
P.S. I'm using OpenXML because it seems to work faster. I would also appreciate a solution with xml.nodes()/xquery

Seems a couple of subqueries and a JSON PATH is what you want here. Note, as well, I had to amend your xml to remove the leading line break, as that actually makes the value an invalid xml value:
DECLARE #idoc int,
#xml xml = '<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
SELECT c.i.value('(id/text())[1]','int') AS id,
(SELECT i.a.value('(coutry/text())[1]','varchar(30)') AS country, --It's spelt country, I suggest fixing this at your source, as fundament typographical errors like this can be a real problem later down the line
i.a.value('(zip/text())[1]','varchar(30)') AS zip,
i.a.value('(city/text())[1]','varchar(30)') AS city
FROM c.i.nodes('address')i(a)
FOR JSON PATH) AS AddressJson,
(SELECT i.d.value('(num/text())[1]','int') AS num,
i.d.value('(issued/text())[1]','date') AS issued,
i.d.value('(description/text())[1]','varchar(30)') AS description
FROM c.i.nodes('document')i(d)
FOR JSON PATH) AS DocumentJson
FROM #xml.nodes('collection/individual') c(i);
db<>fiddle

Related

Best ETL tool for converting XML to a table

I need to convert >500 XML's to tables that I can query. I have the XSD that I use to verify the structure. I was considering using notepad++ to structure the files. Is that a good idea, if not what is better? The end goal is either flatfiles with the same columns or directly to SQL
Example #1 XML
(...)
<Customer>
<CustomerID>1</CustomerID>
<Address>
<Street>John Street</Street>
<Number>6</Number>
<Apartment>68</Apartment>
<City>New York</City>
<Zip>10068</Zip>
</Address>
<Firstname>John</Firstname>
<LastName>Doe<LastName/>
</Customer>
(...)
Example #2 XML
(...)
<Customer>
<CustomerID>2</CustomerID>
<Address>
<Street>Wall Street</Street>
<City>New York</City>
</Address>
<Firstname>James Smith</Firstname>
</Customer>
(...)
Example #3 XML
(...)
<n1:Customer>
<n1:CustomerID>3</n1:CustomerID>
<n1:Address>
<n1:Apartment>32</n1:Apartment>
<n1:City>Chicago</n1:City>
</n1:Address>
</n1:Customer>
(...)

Insert XML child node to SQL table

I've got an XML file like this and I'm working with SQL 2014 SP2
<?xml version='1.0' encoding='UTF-8'?>
<gwl>
<version>123456789</version>
<entities>
<entity id="1" version="123456789">
<name>xxxxx</name>
<listId>0</listId>
<listCode>Oxxx</listCode>
<entityType>08</entityType>
<createdDate>03/03/1993</createdDate>
<lastUpdateDate>05/06/2011</lastUpdateDate>
<source>src</source>
<OriginalSource>o_src</OriginalSource>
<aliases>
<alias category="STRONG" type="Alias">USCJSC</alias>
<alias category="WEAK" type="Alias">'OSKOAO'</alias>
</aliases>
<programs>
<program type="21">prog</program>
</programs>
<sdfs>
<sdf name="OriginalID">9876</sdf>
</sdfs>
<addresses>
<address>
<address1>1141, SYA-KAYA STR.</address1>
<country>RU</country>
<postalCode>1234</postalCode>
</address>
<address>
<address1>90, MARATA UL.</address1>
<country>RU</country>
<postalCode>1919</postalCode>
</address>
</addresses>
<otherIds>
<childId>737606</childId>
<childId>737607</childId>
</otherIds>
</entity>
</entities>
</gwl>
I made a script to insert data from the XML to a SQL table. How can I insert child node into a table? I think I should replicate the row for each new child node but i don't know the best way to proceed.
Here is my SQL code
DECLARE #InputXML XML
SELECT #InputXML = CAST(x AS XML)
FROM OPENROWSET(BULK 'C:\MyFiles\sample.XML', SINGLE_BLOB) AS T(x)
SELECT
product.value('(#id)[1]', 'NVARCHAR(10)') id,
product.value('(#version)[1]', 'NVARCHAR(14)') ID
product.value('(name[1])', 'NVARCHAR(255)') name,
product.value('(listId[1])', 'NVARCHAR(9)')listId,
product.value('(listCode[1])', 'NVARCHAR(10)')listCode,
product.value('(entityType[1])', 'NVARCHAR(2)')entityType,
product.value('(createdDate[1])', 'NVARCHAR(10)')createdDate,
product.value('(lastUpdateDate[1])', 'NVARCHAR(10)')lastUpdateDate,
product.value('(source[1])', 'NVARCHAR(15)')source,
product.value('(OriginalSource[1])', 'NVARCHAR(50)')OriginalSource,
product.value('(aliases[1])', 'NVARCHAR(50)')aliases,
product.value('(programs[1])', 'NVARCHAR(50)')programs,
product.value('(sdfs[1])', 'NVARCHAR(500)')sdfs,
product.value('(addresses[1])', 'NVARCHAR(50)')addresses,
product.value('(otherIDs[1])', 'NVARCHAR(50)')otherIDs
FROM #InputXML.nodes('gwl/entities/entity') AS X(product)
You have a lot of different children here...
Just to show the principles:
DECLARE #xml XML=
N'<gwl>
<version>123456789</version>
<entities>
<entity id="1" version="123456789">
<name>xxxxx</name>
<listId>0</listId>
<listCode>Oxxx</listCode>
<entityType>08</entityType>
<createdDate>03/03/1993</createdDate>
<lastUpdateDate>05/06/2011</lastUpdateDate>
<source>src</source>
<OriginalSource>o_src</OriginalSource>
<aliases>
<alias category="STRONG" type="Alias">USCJSC</alias>
<alias category="WEAK" type="Alias">''OSKOAO''</alias>
</aliases>
<programs>
<program type="21">prog</program>
</programs>
<sdfs>
<sdf name="OriginalID">9876</sdf>
</sdfs>
<addresses>
<address>
<address1>1141, SYA-KAYA STR.</address1>
<country>RU</country>
<postalCode>1234</postalCode>
</address>
<address>
<address1>90, MARATA UL.</address1>
<country>RU</country>
<postalCode>1919</postalCode>
</address>
</addresses>
<otherIds>
<childId>737606</childId>
<childId>737607</childId>
</otherIds>
</entity>
</entities>
</gwl>';
-The query will fetch some values from several places.
--It should be easy to get the rest yourself...
SELECT #xml.value('(/gwl/version/text())[1]','bigint') AS [version]
,A.ent.value('(name/text())[1]','nvarchar(max)') AS [Entity_Name]
,A.ent.value('(listId/text())[1]','int') AS Entity_ListId
--more columns taken from A.ent
,B.als.value('#category','nvarchar(max)') AS Alias_Category
,B.als.value('text()[1]','nvarchar(max)') AS Alias_Content
--similar for programs and sdfs
,E.addr.value('(address1/text())[1]','nvarchar(max)') AS Address_Address1
,E.addr.value('(country/text())[1]','nvarchar(max)') AS Address_Country
--and so on
FROM #xml.nodes('/gwl/entities/entity') A(ent)
OUTER APPLY A.ent.nodes('aliases/alias') B(als)
OUTER APPLY A.ent.nodes('programs/program') C(prg)
OUTER APPLY A.ent.nodes('sdfs/sdf') D(sdfs)
OUTER APPLY A.ent.nodes('addresses/address') E(addr)
OUTER APPLY A.ent.nodes('otherIds/childId') F(ids);
The idea in short:
We read non-repeating values (e.g. version) from the xml variable directly
We use .nodes() to return repeating elements as derived sets.
We can use a cascade of .nodes() to dive deeper into repeating child elements by using a relativ Xpath (no / at the beginning).
You have two approaches:
Read the XML like above into a staging table (simply by adding INTO #tmpTable before FROM) and proceed from there (will need one SELECT ... GROUP BY for each type of child).
Create one SELECT per type of child, using only one of the APPLY lines and shift the data into specific child tables.
I would tend to the first one.
This allows to do some cleaning, generate IDs, check for business rules, before you shift this into the target tables.

Convert XML from one format to another

I have this below xml data which is stored in a table.
The XML Structure I have
<Response>
<Question ID="1">
<Value ID="1">I want a completely natural childbirth - no medical interventions for me</Value>
<Value ID="2">no medical interventions for me</Value>
</Question>
</Response>
I need to convert this XML to a slightly different format, like the below one.
The XML Structure I need
<Response>
<Question ID="1">
<SelectedChoices>
<Choice>
<ID>1</ID>
</Choice>
<Choice>
<ID>2</ID>
</Choice>
</SelectedChoices>
</Question>
</Response>
Here the "Value" is changed to "Choice" and "ID" attribute of "Value" element is changed to an element.
I know this can be done in other ways, like using an XSLT. But it will be much more helpful if can accomplish with SQL itself.
Can someone help me to convert this using SQL?
Use this variable to test the statements
DECLARE #xml XML=
N'<Response>
<Question ID="1">
<Value ID="1">I want a completely natural childbirth - no medical interventions for me</Value>
<Value ID="2">no medical interventions for me</Value>
</Question>
</Response>';
This can be done with FLWOR-XQuery:
The query will re-build the XML out of itself... Very similar to XSLT...
SELECT #xml.query(
N'
<Response>
{
for $q in /Response/Question
return
<Question ID="{$q/#ID}">
<SelectedChoices>
{
for $v in $q/Value
return <Choice><ID>{string($v/#ID)}</ID></Choice>
}
</SelectedChoices>
</Question>
}
</Response>
'
);
Another approach: Shredding and re-build
You'd reach the same with this, but I'd prefere the first...
WITH Shredded AS
(
SELECT q.value('#ID','int') AS qID
,v.value('#ID','int') AS vID
FROM #xml.nodes('/Response/Question') AS A(q)
OUTER APPLY q.nodes('Value') AS B(v)
)
SELECT t1.qID AS [#ID]
,(
SELECT t2.vID AS ID
FROM Shredded AS t2
WHERE t1.qID=t2.qID
FOR XML PATH('Choice'),ROOT('SelectedChoices'),TYPE
) AS [node()]
FROM Shredded AS t1
GROUP BY t1.qID
FOR XML PATH('Question'),ROOT('Response')

How to add condition to COALESCE while reading xml in Sql

I am trying to read the xml and storing it in SQL server.
DECLARE #xml XML
SET #xml =
'<report>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
</personal>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
<result>
<history>
<name>HistoryName</name>
</history>
</result>
</personal>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
<result>
<history>
<dob>HistoryDOB</dob>
</history>
</result>
</personal>
</report>
'
What i am trying here is - selecting the name but condition here is
if <personal> contains <result> then select the name under history/name
if <personal> doesn't contain <result> select the name under subject/name
if <personal> contain <result>BUT name is not there then enter null
I am using below query
SELECT
COALESCE(
A.Search.value('(result/history/name)[1]','varchar(max)'),
A.Search.value('(search/subject/name)[1]','varchar(max)')
) AS Name
FROM #xml.nodes('/report/personal') as A(Search)
It is returning
SearchName
HistoryName
SearchName
But it is failing in 3rd condition.
Just tweak the second values call to specifically request what you've specified - that you'll only take a search/subject for nodes with no result:
SELECT
COALESCE(
A.Search.value('(result/history/name)[1]','varchar(max)'),
A.Search.value('(search[not(../result)]/subject/name)[1]','varchar(max)')
) AS Name
FROM #xml.nodes('/report/personal') as A(Search)
Result:
Name
------------------
HistoryName
SearchName
NULL
You can use exist method to chech if <personal> contains <result> or not.
Using it your algorithm can be straightforward translated to query as:
select
case
when A.Search.exist('result') = 1
then A.Search.value('(result/history/name)[1]','varchar(max)')
else A.Search.value('(search/subject/name)[1]','varchar(max)')
end as Name
FROM #xml.nodes('/report/personal') as A(Search)

How to get multiple nodes under 1 single node with T-SQL

My xml file looks something like this:
<PackageRuntimeContext xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<UserToken>
<Id>449694</Id>
</UserToken>
<Addresses>
<Address>
<LastSeen xsi:nil="true" />
<UniqueID>9afd29f6-f4fe-4a91-aade-da8a3fcdc358</UniqueID>
<IsPrimary>true</IsPrimary>
<Id>0</Id>
<OrderID>0</OrderID>
<SubjectId>0</SubjectId>
<AddressLine1>123 Main St.</AddressLine1>
<City>louisville</City>
<State>KY</State>
<ZipCode>40206</ZipCode>
</Address>
<Address>
<LastSeen xsi:nil="true" />
<UniqueID>0ae8014e-a950-48f3-8ee6-3526a7f3a50d</UniqueID>
<IsPrimary>true</IsPrimary>
<Id>0</Id>
<OrderID>0</OrderID>
<SubjectId>0</SubjectId>
<AddressLine1>789 Elm St.</AddressLine1>
<City>louisville</City>
<State>KY</State>
<ZipCode>40206</ZipCode>
</Address>
<Address>
<LastSeen xsi:nil="true" />
<UniqueID>b1bcc271-bec8-432f-b968-25430ba63b95</UniqueID>
<IsPrimary>false</IsPrimary>
<Id>0</Id>
<OrderID>0</OrderID>
<SubjectId>0</SubjectId>
<AddressLine1>456 Oak St.</AddressLine1>
<City>louisville</City>
<State>KY</State>
<ZipCode>40206</ZipCode>
</Address>
</Addresses>
I want to get the <Id> number 449694, and with it, the 3 (or whatever) subsequent <UniqueID> numbers under Addresses/Address so it looks something like this:
IDNumber UniqueID
======== ========
449694 9afd29f6-f4fe-4a91-aade-da8a3fcdc358
449694 0ae8014e-a950-48f3-8ee6-3526a7f3a50d
449694 b1bcc271-bec8-432f-b968-25430ba63b95
The code If found here (How to query values from xml nodes?) directed me to write something like this:
SELECT
t.p.value('(./UserToken/Id)[1]', 'int') [IdNumber],
t.p.value('(./Addresses/Address/UniqueID)[1]', 'varchar(max)') [Context]
FROM product.PackageRuntimeState prs WITH(NOLOCK)
CROSS APPLY prs.Context.nodes('/PackageRuntimeContext') t(p)
My results were:
IDNumber UniqueID
======== ========
449694 9afd29f6-f4fe-4a91-aade-da8a3fcdc358
449694 b8439471-d4b9-46db-9321-b6175e1b8fb4 (this is from ANOTHER record)
449694 b8439471-d4b9-46db-9321-b6175e1b8fb4 (this too is from another record)
What do I need to do to my code to get the subsequent UniqueID nodes from my xml file?
Thanks!
Drop down one more level. You need to list the direct decendants of <Addresses>, not <PackageRuntimeContext>
SELECT
t.p.value('(../../UserToken/Id)[1]', 'int') [IdNumber],
t.p.value('(./UniqueID)[1]', 'varchar(max)') [Context]
FROM product.PackageRuntimeState prs WITH(NOLOCK)
CROSS APPLY prs.Context.nodes('/PackageRuntimeContext/Addresses/Address') t(p)

Resources