How to get multiple nodes under 1 single node with T-SQL - sql-server

My xml file looks something like this:
<PackageRuntimeContext xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<UserToken>
<Id>449694</Id>
</UserToken>
<Addresses>
<Address>
<LastSeen xsi:nil="true" />
<UniqueID>9afd29f6-f4fe-4a91-aade-da8a3fcdc358</UniqueID>
<IsPrimary>true</IsPrimary>
<Id>0</Id>
<OrderID>0</OrderID>
<SubjectId>0</SubjectId>
<AddressLine1>123 Main St.</AddressLine1>
<City>louisville</City>
<State>KY</State>
<ZipCode>40206</ZipCode>
</Address>
<Address>
<LastSeen xsi:nil="true" />
<UniqueID>0ae8014e-a950-48f3-8ee6-3526a7f3a50d</UniqueID>
<IsPrimary>true</IsPrimary>
<Id>0</Id>
<OrderID>0</OrderID>
<SubjectId>0</SubjectId>
<AddressLine1>789 Elm St.</AddressLine1>
<City>louisville</City>
<State>KY</State>
<ZipCode>40206</ZipCode>
</Address>
<Address>
<LastSeen xsi:nil="true" />
<UniqueID>b1bcc271-bec8-432f-b968-25430ba63b95</UniqueID>
<IsPrimary>false</IsPrimary>
<Id>0</Id>
<OrderID>0</OrderID>
<SubjectId>0</SubjectId>
<AddressLine1>456 Oak St.</AddressLine1>
<City>louisville</City>
<State>KY</State>
<ZipCode>40206</ZipCode>
</Address>
</Addresses>
I want to get the <Id> number 449694, and with it, the 3 (or whatever) subsequent <UniqueID> numbers under Addresses/Address so it looks something like this:
IDNumber UniqueID
======== ========
449694 9afd29f6-f4fe-4a91-aade-da8a3fcdc358
449694 0ae8014e-a950-48f3-8ee6-3526a7f3a50d
449694 b1bcc271-bec8-432f-b968-25430ba63b95
The code If found here (How to query values from xml nodes?) directed me to write something like this:
SELECT
t.p.value('(./UserToken/Id)[1]', 'int') [IdNumber],
t.p.value('(./Addresses/Address/UniqueID)[1]', 'varchar(max)') [Context]
FROM product.PackageRuntimeState prs WITH(NOLOCK)
CROSS APPLY prs.Context.nodes('/PackageRuntimeContext') t(p)
My results were:
IDNumber UniqueID
======== ========
449694 9afd29f6-f4fe-4a91-aade-da8a3fcdc358
449694 b8439471-d4b9-46db-9321-b6175e1b8fb4 (this is from ANOTHER record)
449694 b8439471-d4b9-46db-9321-b6175e1b8fb4 (this too is from another record)
What do I need to do to my code to get the subsequent UniqueID nodes from my xml file?
Thanks!

Drop down one more level. You need to list the direct decendants of <Addresses>, not <PackageRuntimeContext>
SELECT
t.p.value('(../../UserToken/Id)[1]', 'int') [IdNumber],
t.p.value('(./UniqueID)[1]', 'varchar(max)') [Context]
FROM product.PackageRuntimeState prs WITH(NOLOCK)
CROSS APPLY prs.Context.nodes('/PackageRuntimeContext/Addresses/Address') t(p)

Related

SQL Server reduce recurring XML nodes to JSON array

I have some XML in which every entry can contain some recurring elements. I'm trying to query it with OpenXML function and I want to reduce those elements to JSON arrays.
My SQL looks like this:
declare #idoc int,
#xml xml = '
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
exec sp_xml_preparedocument #idoc out, #xml;
select
id as ID
, address as AddressesJson
, document as DocumentsJson
from openxml(#idoc, '//individual', 2) with (
id int
, address nvarchar(max)
, document nvarchar(max)
);
exec sp_xml_removedocument #idoc;
The rusult I'm getting is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |Country1ZIP1City1 |1012020-01-01desc1 |
|2 |Country3ZIP3City3 |1032020-01-03desc3 |
What I would like to get is
|ID |AddressesJson |DocumentsJson |
|---|-------------------|-------------------|
|1 |[{"coutry":"Country1","zip":"ZIP1","city":"City1"},{"coutry":"Country2","zip":"ZIP2","city":"City2"}] |[{"num":"101","issued":"2020-01-01","description":"desc1"},{"num":"102","issued":"2020-01-02","description":"desc2"}] |
|2 |[{"coutry":"Country3","zip":"ZIP3","city":"City3"},{"coutry":"Country4","zip":"ZIP4","city":"City4"}] |[{"num":"103","issued":"2020-01-03","description":"desc3"},{"num":"104","issued":"2020-01-04","description":"desc4"}] |
How can I achieve this?
P.S. I'm using OpenXML because it seems to work faster. I would also appreciate a solution with xml.nodes()/xquery
Seems a couple of subqueries and a JSON PATH is what you want here. Note, as well, I had to amend your xml to remove the leading line break, as that actually makes the value an invalid xml value:
DECLARE #idoc int,
#xml xml = '<?xml version="1.0" encoding="UTF-8"?>
<collection>
<individual>
<id>1</id>
<address>
<coutry>Country1</coutry>
<zip>ZIP1</zip>
<city>City1</city>
</address>
<address>
<coutry>Country2</coutry>
<zip>ZIP2</zip>
<city>City2</city>
</address>
<document>
<num>101</num>
<issued>2020-01-01</issued>
<description>desc1</description>
</document>
<document>
<num>102</num>
<issued>2020-01-01</issued>
<description>desc2</description>
</document>
</individual>
<individual>
<id>2</id>
<address>
<coutry>Country3</coutry>
<zip>ZIP3</zip>
<city>City3</city>
</address>
<address>
<coutry>Country4</coutry>
<zip>ZIP4</zip>
<city>City4</city>
</address>
<document>
<num>103</num>
<issued>2020-01-03</issued>
<description>desc3</description>
</document>
<document>
<num>104</num>
<issued>2020-01-04</issued>
<description>desc4</description>
</document>
</individual>
</collection>';
SELECT c.i.value('(id/text())[1]','int') AS id,
(SELECT i.a.value('(coutry/text())[1]','varchar(30)') AS country, --It's spelt country, I suggest fixing this at your source, as fundament typographical errors like this can be a real problem later down the line
i.a.value('(zip/text())[1]','varchar(30)') AS zip,
i.a.value('(city/text())[1]','varchar(30)') AS city
FROM c.i.nodes('address')i(a)
FOR JSON PATH) AS AddressJson,
(SELECT i.d.value('(num/text())[1]','int') AS num,
i.d.value('(issued/text())[1]','date') AS issued,
i.d.value('(description/text())[1]','varchar(30)') AS description
FROM c.i.nodes('document')i(d)
FOR JSON PATH) AS DocumentJson
FROM #xml.nodes('collection/individual') c(i);
db<>fiddle

XML File into SQL, duplicate tags, and filtering? How to

So I have an XML file as follows (last code block example)... you'll notice below, there are tags for "ITEM" with a name and a value tag nested. Sometimes, there are 6 items per xml file, sometimes 2. They are NOT always in the same order. Initially, when I was querying the data I assumed from looking at the first 20 files that there would be always 6 Items... however, sometimes, there isn't. How can I filter when the Name of the Item = "something specific"....
In this example below, file count
For example
CAST(Body AS XML).value('(/JobCompleteStatistics/JobSpecificInformation/Item/Name)[1]', 'varchar(100)'
Sometimes, this is a file count, other times if a file count wasn't in the file, it is actually a database size....so my calculations and info in the columns is wrong.
Since there are a random set of tags with the same name, how can I filter so I know what I am dealing with?
Case didn't seem to be the way to go...I am using something like this but, obviously, sometimes the Values aren't actually what they are expected to be.. help!
CAST(Body AS XML).value('(/JobCompleteStatistics/WorkspaceAncestry/WorkspaceAncestry/WorkspaceName)[1]', 'varchar(100)') AS [WorkspaceName],
CAST(Body AS XML).value('(/JobCompleteStatistics/JobSpecificInformation/Item/Value)[0]', 'varchar(100)') AS [FileCount],
CAST(Body AS XML).value('(/JobCompleteStatistics/JobSpecificInformation/Item/Value)[2]', 'varchar(100)') AS [FileSize],
CAST(Body AS XML).value('(/JobCompleteStatistics/JobSpecificInformation/Item/Value)[3]', 'varchar(250)') AS [DatabaseSize],
CAST(Body AS XML).value('(/JobCompleteStatistics/JobSpecificInformation/Item/Value)[4]', 'varchar(100)') AS [DTIndexSize],
CAST(Body AS XML).value('(/JobCompleteStatistics/WorkspaceAncestry/WorkspaceAncestry/EventType)[2]', 'varchar(100)') AS [EventType],
CAST(Body AS XML).value('(/JobCompleteStatistics/WorkspaceAncestry/WorkspaceAncestry/EventType)[1]', 'varchar(100)') AS [EventType2]
<?xml version="1.0"?>
<JobCompleteStatistics xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<JobType>Archive</JobType>
<OriginalWorkspaceGuid>12964d66-3ee9-4072-8771-3cc339d1fe6d</OriginalWorkspaceGuid>
<CurrentWorkspaceGuid>12964d66-3ee9-4072-8771-3cc339d1fe6d</CurrentWorkspaceGuid>
<ARMVersion>9.6.1.14</ARMVersion>
<ExecutionType>Manual</ExecutionType>
<WorkspaceAncestry>
<WorkspaceAncestry>
<EventTime>2018-04-23T19:10:35Z</EventTime>
<EventType>Archive</EventType>
<Id>1</Id>
<InstanceName>Instance1</InstanceName>
<WorkspaceGuid>12964d66-3ee9-4072-8771-3cc339d1fe6d</WorkspaceGuid>
<WorkspaceName>Workspace 1</WorkspaceName>
</WorkspaceAncestry>
</WorkspaceAncestry>
<CompletionTime>2018-04-23T23:42:34Z</CompletionTime>
<ElapsedTime>00:04:32:22</ElapsedTime>
<ErrorCount>0</ErrorCount>
<JobSpecificInformation>
<Item>
<Name>File Count</Name>
<Value>1847139</Value>
</Item>
<Item>
<Name>File Size</Name>
<Value>385894107665</Value>
</Item>
<Item>
<Name>Database Size</Name>
<Value>6930992</Value>
</Item>
<Item>
<Name>DtIndex Size</Name>
<Value>5859343220</Value>
</Item>
<Item>
<Name>Content Analytics Size</Name>
<Value>0</Value>
</Item>
<Item>
<Name>Archive Compression Level</Name>
<Value>0</Value>
</Item>
</JobSpecificInformation>
<RetryCount>0</RetryCount>
</JobCompleteStatistics>
If you cannot assume that all Item nodes will always be present and that they'll always be in a given order, but there is a known set of values that appears in Name nodes, then you can specify XPath filter expressions to assert the order you need, e.g.:
select
JobSpecificInformation.value('(Item[Name/text()="Archive Compression Level"]/Value/text())[1]', 'varchar(100)') as [Archive Compression Level],
JobSpecificInformation.value('(Item[Name/text()="Content Analytics Size"]/Value/text())[1]', 'varchar(100)') as [Content Analytics Size],
JobSpecificInformation.value('(Item[Name/text()="Database Size"]/Value/text())[1]', 'varchar(100)') as [Database Size],
JobSpecificInformation.value('(Item[Name/text()="DtIndex Size"]/Value/text())[1]', 'varchar(100)') as [DtIndex Size],
JobSpecificInformation.value('(Item[Name/text()="File Count"]/Value/text())[1]', 'varchar(100)') as [File Count],
JobSpecificInformation.value('(Item[Name/text()="File Size"]/Value/text())[1]', 'varchar(100)') as [File Size]
from dbo.Example
cross apply (select cast(Body as XML)) foo(BodyXML)
cross apply BodyXML.nodes('/JobCompleteStatistics/JobSpecificInformation') stats(JobSpecificInformation);
Which returns the results:
Archive Compression Level
Content Analytics Size
Database Size
DtIndex Size
File Count
File Size
0
0
6930992
5859343220
1847139
385894107665

Insert XML child node to SQL table

I've got an XML file like this and I'm working with SQL 2014 SP2
<?xml version='1.0' encoding='UTF-8'?>
<gwl>
<version>123456789</version>
<entities>
<entity id="1" version="123456789">
<name>xxxxx</name>
<listId>0</listId>
<listCode>Oxxx</listCode>
<entityType>08</entityType>
<createdDate>03/03/1993</createdDate>
<lastUpdateDate>05/06/2011</lastUpdateDate>
<source>src</source>
<OriginalSource>o_src</OriginalSource>
<aliases>
<alias category="STRONG" type="Alias">USCJSC</alias>
<alias category="WEAK" type="Alias">'OSKOAO'</alias>
</aliases>
<programs>
<program type="21">prog</program>
</programs>
<sdfs>
<sdf name="OriginalID">9876</sdf>
</sdfs>
<addresses>
<address>
<address1>1141, SYA-KAYA STR.</address1>
<country>RU</country>
<postalCode>1234</postalCode>
</address>
<address>
<address1>90, MARATA UL.</address1>
<country>RU</country>
<postalCode>1919</postalCode>
</address>
</addresses>
<otherIds>
<childId>737606</childId>
<childId>737607</childId>
</otherIds>
</entity>
</entities>
</gwl>
I made a script to insert data from the XML to a SQL table. How can I insert child node into a table? I think I should replicate the row for each new child node but i don't know the best way to proceed.
Here is my SQL code
DECLARE #InputXML XML
SELECT #InputXML = CAST(x AS XML)
FROM OPENROWSET(BULK 'C:\MyFiles\sample.XML', SINGLE_BLOB) AS T(x)
SELECT
product.value('(#id)[1]', 'NVARCHAR(10)') id,
product.value('(#version)[1]', 'NVARCHAR(14)') ID
product.value('(name[1])', 'NVARCHAR(255)') name,
product.value('(listId[1])', 'NVARCHAR(9)')listId,
product.value('(listCode[1])', 'NVARCHAR(10)')listCode,
product.value('(entityType[1])', 'NVARCHAR(2)')entityType,
product.value('(createdDate[1])', 'NVARCHAR(10)')createdDate,
product.value('(lastUpdateDate[1])', 'NVARCHAR(10)')lastUpdateDate,
product.value('(source[1])', 'NVARCHAR(15)')source,
product.value('(OriginalSource[1])', 'NVARCHAR(50)')OriginalSource,
product.value('(aliases[1])', 'NVARCHAR(50)')aliases,
product.value('(programs[1])', 'NVARCHAR(50)')programs,
product.value('(sdfs[1])', 'NVARCHAR(500)')sdfs,
product.value('(addresses[1])', 'NVARCHAR(50)')addresses,
product.value('(otherIDs[1])', 'NVARCHAR(50)')otherIDs
FROM #InputXML.nodes('gwl/entities/entity') AS X(product)
You have a lot of different children here...
Just to show the principles:
DECLARE #xml XML=
N'<gwl>
<version>123456789</version>
<entities>
<entity id="1" version="123456789">
<name>xxxxx</name>
<listId>0</listId>
<listCode>Oxxx</listCode>
<entityType>08</entityType>
<createdDate>03/03/1993</createdDate>
<lastUpdateDate>05/06/2011</lastUpdateDate>
<source>src</source>
<OriginalSource>o_src</OriginalSource>
<aliases>
<alias category="STRONG" type="Alias">USCJSC</alias>
<alias category="WEAK" type="Alias">''OSKOAO''</alias>
</aliases>
<programs>
<program type="21">prog</program>
</programs>
<sdfs>
<sdf name="OriginalID">9876</sdf>
</sdfs>
<addresses>
<address>
<address1>1141, SYA-KAYA STR.</address1>
<country>RU</country>
<postalCode>1234</postalCode>
</address>
<address>
<address1>90, MARATA UL.</address1>
<country>RU</country>
<postalCode>1919</postalCode>
</address>
</addresses>
<otherIds>
<childId>737606</childId>
<childId>737607</childId>
</otherIds>
</entity>
</entities>
</gwl>';
-The query will fetch some values from several places.
--It should be easy to get the rest yourself...
SELECT #xml.value('(/gwl/version/text())[1]','bigint') AS [version]
,A.ent.value('(name/text())[1]','nvarchar(max)') AS [Entity_Name]
,A.ent.value('(listId/text())[1]','int') AS Entity_ListId
--more columns taken from A.ent
,B.als.value('#category','nvarchar(max)') AS Alias_Category
,B.als.value('text()[1]','nvarchar(max)') AS Alias_Content
--similar for programs and sdfs
,E.addr.value('(address1/text())[1]','nvarchar(max)') AS Address_Address1
,E.addr.value('(country/text())[1]','nvarchar(max)') AS Address_Country
--and so on
FROM #xml.nodes('/gwl/entities/entity') A(ent)
OUTER APPLY A.ent.nodes('aliases/alias') B(als)
OUTER APPLY A.ent.nodes('programs/program') C(prg)
OUTER APPLY A.ent.nodes('sdfs/sdf') D(sdfs)
OUTER APPLY A.ent.nodes('addresses/address') E(addr)
OUTER APPLY A.ent.nodes('otherIds/childId') F(ids);
The idea in short:
We read non-repeating values (e.g. version) from the xml variable directly
We use .nodes() to return repeating elements as derived sets.
We can use a cascade of .nodes() to dive deeper into repeating child elements by using a relativ Xpath (no / at the beginning).
You have two approaches:
Read the XML like above into a staging table (simply by adding INTO #tmpTable before FROM) and proceed from there (will need one SELECT ... GROUP BY for each type of child).
Create one SELECT per type of child, using only one of the APPLY lines and shift the data into specific child tables.
I would tend to the first one.
This allows to do some cleaning, generate IDs, check for business rules, before you shift this into the target tables.

If condition on Selecting XML node in SQL

I am trying to read the xml and storing it in SQL server.
DECLARE #xml XML
SET #xml =
'<report>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
</personal>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
<result>
<history>
<name>HistoryName</name>
</history>
</result>
</personal>
</report>
'
What i am trying here is - selecting the name but condition here is
if <personal> contains <result> then select the name under history/name
if <personal> doesn't contain <result> select the name under subject/name
currently i am selecting names from personal/subject as below:
Select
A.Search.value('(subject/name)[1]','varchar(max)')
FROM #xml.nodes('/report/personal/search') as A(Search)
Expecting result:
SearchName
HistoryName
How to add condition in between?
Is there any way we can add exists condition here
SELECT #xml.exist('//report//personal//search//subject//name')
Select coalesce(A.Search.value('(result/history/name)[1]', 'varchar(max)'), A.Search.value('(search/subject/name)[1]','varchar(max)'))
FROM #xml.nodes('/report/personal') as A(Search)
This:
SELECT
COALESCE(
A.Search.value('(result/history/name)[1]','varchar(max)'),
A.Search.value('(search/subject/name)[1]','varchar(max)')
) AS Name
FROM #xml.nodes('/report/personal') as A(Search)
will return:
Name
------------
SearchName
HistoryName

How to add condition to COALESCE while reading xml in Sql

I am trying to read the xml and storing it in SQL server.
DECLARE #xml XML
SET #xml =
'<report>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
</personal>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
<result>
<history>
<name>HistoryName</name>
</history>
</result>
</personal>
<personal>
<search>
<subject>
<name>SearchName</name>
</subject>
</search>
<result>
<history>
<dob>HistoryDOB</dob>
</history>
</result>
</personal>
</report>
'
What i am trying here is - selecting the name but condition here is
if <personal> contains <result> then select the name under history/name
if <personal> doesn't contain <result> select the name under subject/name
if <personal> contain <result>BUT name is not there then enter null
I am using below query
SELECT
COALESCE(
A.Search.value('(result/history/name)[1]','varchar(max)'),
A.Search.value('(search/subject/name)[1]','varchar(max)')
) AS Name
FROM #xml.nodes('/report/personal') as A(Search)
It is returning
SearchName
HistoryName
SearchName
But it is failing in 3rd condition.
Just tweak the second values call to specifically request what you've specified - that you'll only take a search/subject for nodes with no result:
SELECT
COALESCE(
A.Search.value('(result/history/name)[1]','varchar(max)'),
A.Search.value('(search[not(../result)]/subject/name)[1]','varchar(max)')
) AS Name
FROM #xml.nodes('/report/personal') as A(Search)
Result:
Name
------------------
HistoryName
SearchName
NULL
You can use exist method to chech if <personal> contains <result> or not.
Using it your algorithm can be straightforward translated to query as:
select
case
when A.Search.exist('result') = 1
then A.Search.value('(result/history/name)[1]','varchar(max)')
else A.Search.value('(search/subject/name)[1]','varchar(max)')
end as Name
FROM #xml.nodes('/report/personal') as A(Search)

Resources